Most of the pure waiting in the job will be for replication (the throttling just makes the worker request finish and the slot go to another wiki). It seems worth considering to just use that setting.
The batch size is already configurable, that was deployed a few weeks go. The relevant setting is WikiPageUpdaterDbBatchSize.
This seems to be insufficient though, since WikiPageUpdater triggers three kinds of jobs with very different batch size. For this reason, I made https://gerrit.wikimedia.org/r/#/c/377046/ as requested by Giuseppe.
The three job types are:
- InjectRCRecordsJob - this could probably use $wgUpdateRowsPerQuery, since it just inserts rows; but currently, we run a query for each change, checking for duplicates. We could a) stop doing that b) batch that c) have a separate setting, so we can tweak batch size
- UpdateHtmlCacheJob - updates page_touched, and sends purge requests to CDN. Could also use $wgUpdateRowsPerQuery, if we consider cost of CDN purges to be negligible.
- RefreshLinksJob - parses pages and triggers potentially large updates to various links tables. Should have a separate setting, or even no batching at all, to improve deduplication. This job being slow was the reason for the hotfix.
We could have the first two settings default to $wgUpdateRowsPerQuery, if we want the ability to tweak them if needed, but want to go with $wgUpdateRowsPerQuery initially.