We made a hotfix to reduce the size of db batch size in WikiPageUpdater but this needs to be configurable and deploy-able easily.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | aaron | T175897 Audit and improve JobQueue stability and performance (2017) | |||
Resolved | Ladsgroup | T173710 Job queue is increasing non-stop | |||
Resolved | daniel | T174422 Make dbBatchSize in WikiPageUpdater configurable |
Event Timeline
Change 374505 had a related patch set uploaded (by AnotherLadsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/Wikibase@master] Make dbBatchSize in WikiPageUpdater configurable
Change 374505 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Make dbBatchSize in WikiPageUpdater configurable
@Ladsgroup @aaron Would $wgUpdateRowsPerQuery be appropiate here, too? Or is it important for this particular query to use a different batch size?
Most of the pure waiting in the job will be for replication (the throttling just makes the worker request finish and the slot go to another wiki). It seems worth considering to just use that setting.
The batch size is already configurable, that was deployed a few weeks go. The relevant setting is WikiPageUpdaterDbBatchSize.
This seems to be insufficient though, since WikiPageUpdater triggers three kinds of jobs with very different batch size. For this reason, I made https://gerrit.wikimedia.org/r/#/c/377046/ as requested by Giuseppe.
The three job types are:
- InjectRCRecordsJob - this could probably use $wgUpdateRowsPerQuery, since it just inserts rows; but currently, we run a query for each change, checking for duplicates. We could a) stop doing that b) batch that c) have a separate setting, so we can tweak batch size
- UpdateHtmlCacheJob - updates page_touched, and sends purge requests to CDN. Could also use $wgUpdateRowsPerQuery, if we consider cost of CDN purges to be negligible.
- RefreshLinksJob - parses pages and triggers potentially large updates to various links tables. Should have a separate setting, or even no batching at all, to improve deduplication. This job being slow was the reason for the hotfix.
We could have the first two settings default to $wgUpdateRowsPerQuery, if we want the ability to tweak them if needed, but want to go with $wgUpdateRowsPerQuery initially.
Change 377046 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/extensions/Wikibase@master] Allow batch sizes for different jobs to be defined separately.
As per https://gerrit.wikimedia.org/r/#/c/377458/, WikiPageUpdaterDbBatchSize is now set to 20. That's probably still to high for RefreshLinksJob, and way low for UpdateHtmlCacheJob and InjectRCRecordsJob.
Afaics, this task as described in the summary is complete, so this ticket can be closed. Or shall we keep it open to discuss more fine grained configuration, as per my proposal above?
Change 378228 had a related patch set uploaded (by Thiemo Mättig (WMDE); owner: Thiemo Mättig (WMDE)):
[mediawiki/extensions/Wikibase@master] Clean up unused code, comments and docs in WikiPageUpdaterTest
Reopening, because the patches failed to merge. CI is failing due to a problem in the Cirrus extension T174654.
Change 377046 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Allow batch sizes for different jobs to be defined separately.
Change 378228 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Clean up unused code, comments and docs in WikiPageUpdaterTest