Page MenuHomePhabricator

Make dbBatchSize in WikiPageUpdater configurable
Closed, ResolvedPublic

Description

We made a hotfix to reduce the size of db batch size in WikiPageUpdater but this needs to be configurable and deploy-able easily.

Event Timeline

Ladsgroup renamed this task from Made dbBatchSize in WikiPageUpdater configurable to Make dbBatchSize in WikiPageUpdater configurable.Aug 29 2017, 8:59 AM

Change 374505 had a related patch set uploaded (by AnotherLadsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/Wikibase@master] Make dbBatchSize in WikiPageUpdater configurable

https://gerrit.wikimedia.org/r/374505

Change 374505 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Make dbBatchSize in WikiPageUpdater configurable

https://gerrit.wikimedia.org/r/374505

@Ladsgroup @aaron Would $wgUpdateRowsPerQuery be appropiate here, too? Or is it important for this particular query to use a different batch size?

@Ladsgroup @aaron Would $wgUpdateRowsPerQuery be appropiate here, too? Or is it important for this particular query to use a different batch size?

Most of the pure waiting in the job will be for replication (the throttling just makes the worker request finish and the slot go to another wiki). It seems worth considering to just use that setting.

The batch size is already configurable, that was deployed a few weeks go. The relevant setting is WikiPageUpdaterDbBatchSize.

This seems to be insufficient though, since WikiPageUpdater triggers three kinds of jobs with very different batch size. For this reason, I made https://gerrit.wikimedia.org/r/#/c/377046/ as requested by Giuseppe.
The three job types are:

  • InjectRCRecordsJob - this could probably use $wgUpdateRowsPerQuery, since it just inserts rows; but currently, we run a query for each change, checking for duplicates. We could a) stop doing that b) batch that c) have a separate setting, so we can tweak batch size
  • UpdateHtmlCacheJob - updates page_touched, and sends purge requests to CDN. Could also use $wgUpdateRowsPerQuery, if we consider cost of CDN purges to be negligible.
  • RefreshLinksJob - parses pages and triggers potentially large updates to various links tables. Should have a separate setting, or even no batching at all, to improve deduplication. This job being slow was the reason for the hotfix.

We could have the first two settings default to $wgUpdateRowsPerQuery, if we want the ability to tweak them if needed, but want to go with $wgUpdateRowsPerQuery initially.

Change 377046 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/extensions/Wikibase@master] Allow batch sizes for different jobs to be defined separately.

https://gerrit.wikimedia.org/r/377046

As per https://gerrit.wikimedia.org/r/#/c/377458/, WikiPageUpdaterDbBatchSize is now set to 20. That's probably still to high for RefreshLinksJob, and way low for UpdateHtmlCacheJob and InjectRCRecordsJob.

Afaics, this task as described in the summary is complete, so this ticket can be closed. Or shall we keep it open to discuss more fine grained configuration, as per my proposal above?

Change 378228 had a related patch set uploaded (by Thiemo Mättig (WMDE); owner: Thiemo Mättig (WMDE)):
[mediawiki/extensions/Wikibase@master] Clean up unused code, comments and docs in WikiPageUpdaterTest

https://gerrit.wikimedia.org/r/378228

thiemowmde reassigned this task from Ladsgroup to daniel.
thiemowmde triaged this task as High priority.
thiemowmde moved this task from Review to Done on the Wikidata-Former-Sprint-Board board.

Reopening, because the patches failed to merge. CI is failing due to a problem in the Cirrus extension T174654.

Change 377046 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Allow batch sizes for different jobs to be defined separately.

https://gerrit.wikimedia.org/r/377046

Change 378228 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Clean up unused code, comments and docs in WikiPageUpdaterTest

https://gerrit.wikimedia.org/r/378228

daniel moved this task from Review to Done on the Wikidata-Former-Sprint-Board board.

All relevant patches are merged now.