Current state: The "dispatch" process for wikidata / wikibase is kicked off from a cron job and maintenance script.
Wikibase users (including WIkibase developers) do not want to have to run extra maintenance scripts.
WMF SREs do not want to manage extra cron jobs (they cause complexity during in cross data centre work)
- Every edit schedules a DispatchTriggerJob that is totally generic.
- The job holds no info at all, so all DispatchTriggerJob of this kind are the same. This means that new jobs get ignored if there is already an older job waiting for execution etc
- We may want to consider having a configurable way to schedule less of these than 1 per edit, 1 per 100 on Wikidata production for example would likely be just fine. Examples in core
- DispatchTriggerJob looks for wikis that meet our dispatch criteria( using the wb_changes_dispatch table as in the current maintenance script, regarding max interval etc) and that are not locked, scheduling 1 DispatchClientJob per wiki
- DispatchClientJob would perform a "pass" for the wiki, as the existing maintenance script does, then unlocking the client wiki.
- Everything from this point on would remain the same
This solution means that we meet the main goal of this work, which is no longer using a maintenance script, while also not having to rewrite the entire dispatching system.
For sites that only have a single client site (such as a local client setup) we could consider directly scheduling DispatchClientJobs, skipping out the in-direction of the DispatchTriggerJob.
It is likely that in production wikidata.org this solution might need some tuning to get the desired behaviour:
- Adequate / desired % of edits triggering the initial job
- Adequate waiting between batches of changes sent to individual clients
- Settings limiting what a "pass" of a wiki can be (as we may now be able to run more passes for each wiki in general)
- No maintenance script / cron job needs to be run for the dispatching process to work
- De-duplication should be used where possible and needed
- Documentation should be updated (in Wikibase.git & architecture docs)
- Grafana monitoring of the dispatch process remains useful for the new solution
When this task is tackled it should be taken in mind that some refactoring will likely make sense, such as T256208: Consolidate places that read/write 'wb_changes' table (but this is also tracked and prioritized separately.
This should be gradually deployed, and this could possibly be done in a couple of different ways:
- Per environment: beta, test, production
- Per client wiki (or group of wikis) within each environment: group1, group2, (everything except enwiki), enwiki, commonswiki
Overall performance of these jobs will be dictated by the job queue processing, which is controlled by WMF SREs and service ops?
We have a general performance requirement of "The dispatching process for Wikidata should not be slower than it currently is"
The code to be deployed from this ticket likely won't have a big impact in performance, though the configuration of the processing of jobs may, and this would need to be figured out with serviceops.
In Wikidata production this cron jobs can be seen at https://github.com/wikimedia/puppet/blob/e1e13a59de3021afaa43c31745abbe348a93017d/modules/profile/manifests/mediawiki/maintenance/wikidata.pp
The current process is monitored on grafana and also has alarms: