Currently, we optimize for the case of a change triggering updates to many pages: we chop the set of pages into batches and push one HTMLCacheUpdateJob for each batch.
However, this does not help with the many changes that only affect a few pages. In that case, each change triggers its own HTMLCacheUpdateJob, even if it's just for a single page.
We could reduce the number of HTMLCacheUpdateJobs posted by collecting pages to be notified over an entire batch of changes, such as the batch contained in a ChangeNotificationJob. This should be done in a way that keeps collecting pages to purge while processing changes, and then chopping that list into batches according to purgeCacheBatchSize. This requires the interaction between (Wiki)PageUpdater and ChangeHandler to be re-engineered.
Side note: While collecting the list of pages to purge, deduplication should be applied (and evaluated via statsd).