User Details
- User Since
- May 4 2023, 9:44 AM (144 w, 3 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Adee Ritman (WMDE) [ Global Accounts ]
Nov 27 2025
Nov 12 2025
Nov 11 2025
The changes to add a federatedValuesEnabled flag are on Gerrit and marked as work in progress. They're meant to be amended by code from other tasks, where the flag can be used and propagated. For that reason, the open change set is linked to the epic ticket.
Oct 28 2025
Oct 8 2025
Sep 2 2025
Aug 22 2025
The starting point for this ticket was using the REST API to get a sense of developer ergonomics. I wrote a simple client that reads JSON files and creates items in a local Wikibase instance via the API. That highlighted the issue of overhead, previously discussed in the first phase of technical research (T378452). To avoid that, I wrote a maintenance script that creates new items directly through entity store. This approach skips some unnecessary checks while preserving data consistency and correctly triggering secondary operations (search index, Recent Changes). The script runs inside the container, so only system administrators with shell access can run it (not to be confused with the "sysops" user group). This fits the assumed main use case for bulk data import, namely initial data import into a fresh instance, or periodic, admin-run updates.
To run the maintenance script, I wrote an item generator that outputs JSON files containing items with a simplified data model. Full benchmarking was out of scope, but I did explore different ways to speed up item creation.
- ID generation is a major part of new item creation. I added two settings that are meant to keep ID generation from blocking the rest of the saving process:
$wgWBRepoSettings['idGenerator'] = 'mysql-upsert'; $wgWBRepoSettings['idGeneratorSeparateDbConnection'] = true;
I didn't notice an improvement on my machine, but they should help more at scale.
I also looked into the batching ingestion extension, which appears to do bulk ID assignment separately from saving items and skip some of the checks WikiPageEntityStore performs. That could be interesting for future performance comparisons, though I preferred not to reimplement ID generation as part of this prototype for safety and correctness.
- Most of the performance improvement beyond not having the HTTP overhead comes from running multiple batches in parallel. The maintenance script itself simply reads a JSON file and creates items in a loop, but it's possible to split the input data into several files and invoke the script multiple times. I tried processing partitioned data with & and using parallel. The latter was slightly slower, likely due to output buffering. This should be validated on larger datasets.
Splitting the input into a small number of batches already delivers a significant improvement, but we need to measure multiple runs with different parameters to determine, for example, a balance between the number of jobs and batch size.
Compared to a new REST API endpoint, the maintenance script is easier to develop and deploy, is inherently limited to system admins, avoids HTTP overhead, and has no request size limit or rate limit. On the other hand, we would probably need to implement input deserialization and validation or try to modularize and reuse that part of the REST API. There is also no easy integration with external tools (e.g. Quickstatements) in this case.
For an MVP, I recommend
- Tuning parameters based on more robust benchmarking.
- Logging failures per item with meaningful error messages so users can re-run only on those items/batches.
- Documenting the workflow, which might be initially complex for Wikibase admins (mounting data and running inside the container). Potentially attaching a simple bash script.
- Implementing data pre-processing to create batch files, adding CSV support.
Aug 4 2025
As part of exploring solutions, I'm prototyping a maintenance script which would avoid the overhead of interacting with the REST API. So far I’ve been testing bulk item creation directly with entity store, and looking into ways to batch or parallelize the process to see if that yields better performance.
Jul 30 2025
Jun 10 2025
Feb 13 2025
Jan 29 2025
Jan 28 2025
Several tools for bulk data import into Wikibase were reviewed and categorized by their approach: API wrappers, scripts interacting with the database, and direct database access. These tools were compared on usability and performance, using benchmarks provided by their developers. The review also highlighted the lack of native API support for bulk import functionality. Most tools function as workarounds, emphasizing the limitations of the current API. The ideal solution would combine the high performance of direct database access with the Wikibase integration and features provided by the API.
These findings were shared with product for feedback. We also consulted with user research and colleagues who worked on the REST API, which led us to prioritize further research into extending the API with a bulk import method or implementing a CLI tool as a potential solution.
Oct 29 2024
Oct 15 2024
Oct 1 2024
Aug 28 2024
Aug 26 2024
Following up on the previous PR, I've found that our Visual Editor tests were only using the features as provided by the extension itself, so they could be removed. On the other hand, the Scribunto/Lua tests are testing our specific installation and configuration within the Wikibase image. I kept the suite and documented why it was necessary.
See https://github.com/wmde/wikibase-release-pipeline/pull/757.
Aug 21 2024
Aug 8 2024
This PR https://github.com/wmde/wikibase-release-pipeline/pull/743 removes some of the tests related to those extensions that we no longer package (confirm edit, nuke, syntax highlight). Some other tests are still there (scribunto, visual editor), and I will look into what exactly they are testing and if they can also be removed or replaced. More extensive work to map out our testing landscape in-house and out is planned here: https://phabricator.wikimedia.org/T34522.
Aug 6 2024
Jul 18 2024
For reference: comparing https://www.mediawiki.org/wiki/Bundled_extensions_and_skins to the extensions mentioned in variables.env, it seems that the shared extensions are exactly the ones in the task's description.
Jul 15 2024
Jun 20 2024
Jun 11 2024
Jun 3 2024
May 27 2024
May 23 2024
Apr 29 2024
Apr 4 2024
Feb 27 2024
The section "release checklist in a more wordy way" of our Introduction to the Wikibase release pipeline is updated with a detailed explanation of the current checklist. We should keep updating it as we change our process.
Feb 13 2024
Feb 6 2024
Jan 23 2024
Jan 22 2024
Dec 21 2023
As the task description suggests, the product verification sheet's federated properties tab hasn't been used for the past few releases. Moreover, it's not clear what the scenarios that were being tested are. Instead, I reviewed our automated test suite, namely item.ts and prefatching.ts.
I set up a local Wikibase instance with federated properties by manually executing the tests. They seem to adequately cover the basic usage of this feature, including:
- Creating an item and adding federated properties to it.
- Ensuring that the item's data is retrievable in json format and not in unsupported formats like TTL and RDF.
- Validating that the addition and deletion of federated properties are reflected in the item's history page and Wikibase's recent changes.
Dec 5 2023
Quickstatements tests: https://github.com/wmde/wikibase-release-pipeline/pull/532
Nov 21 2023
Special item test changes: https://github.com/wmde/wikibase-release-pipeline/pull/528
Nov 20 2023
Nov 16 2023
Nov 14 2023
Oct 19 2023
Oct 5 2023
Sep 28 2023
Went with a simple approach of checking if an image is already loaded before loading it. This means there weren't any changes to the interaction between building and testing. We should probably revisit that depending on / as part of T345689.
