Page MenuHomePhabricator

Wikibase: Batch HTMLCacheUpdateJobs across changes
Closed, ResolvedPublic

Description

Currently, we optimize for the case of a change triggering updates to many pages: we chop the set of pages into batches and push one HTMLCacheUpdateJob for each batch.
However, this does not help with the many changes that only affect a few pages. In that case, each change triggers its own HTMLCacheUpdateJob, even if it's just for a single page.

We could reduce the number of HTMLCacheUpdateJobs posted by collecting pages to be notified over an entire batch of changes, such as the batch contained in a ChangeNotificationJob. This should be done in a way that keeps collecting pages to purge while processing changes, and then chopping that list into batches according to purgeCacheBatchSize. This requires the interaction between (Wiki)PageUpdater and ChangeHandler to be re-engineered.

Side note: While collecting the list of pages to purge, deduplication should be applied (and evaluated via statsd).

Event Timeline

daniel created this task.Oct 23 2017, 3:15 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 23 2017, 3:15 PM
Krinkle updated the task description. (Show Details)Oct 23 2017, 6:31 PM
Krinkle updated the task description. (Show Details)
thiemowmde triaged this task as Low priority.Oct 27 2017, 12:41 PM
thiemowmde moved this task from incoming to needs discussion or investigation on the Wikidata board.

@Addshore wanted me to take a look at this.

This ticket has been made when job queue was going overly large but now it's fixed by fixing the underlying problem. It's still nice to do but not high priority. OTOH, looking at the deduplication ratios, I'm pretty sure we don't deduplicate any jobs on htmlCacheUpdate: https://grafana.wikimedia.org/d/000000400/jobqueue-eventbus?panelId=14&fullscreen&orgId=1 vs. https://grafana.wikimedia.org/d/000000400/jobqueue-eventbus?panelId=2&fullscreen&orgId=1

This needs to be fixed ASAP.

Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.
Krinkle removed a subscriber: Krinkle.

@Addshore wanted me to take a look at this.

I'm pretty sure we don't deduplicate any jobs on htmlCacheUpdate: https://grafana.wikimedia.org/d/000000400/jobqueue-eventbus?panelId=14&fullscreen&orgId=1 vs. https://grafana.wikimedia.org/d/000000400/jobqueue-eventbus?panelId=2&fullscreen&orgId=1

This needs to be fixed ASAP.

It seems this is fixed on its own

Addshore closed this task as Resolved.Nov 26 2019, 9:28 AM
Addshore claimed this task.
Restricted Application added a project: User-Addshore. · View Herald TranscriptNov 26 2019, 9:28 AM