HomePhabricator

Cache countable statistics to prevent multiple counting on import

Description

Cache countable statistics to prevent multiple counting on import

At the moment, when $wgArticleCountMethod = 'link' (as it is on the WMF
cluster), we are querying the Slave database before each individual
revision is imported, in order to find out whether the page is countable
at that time. This is not sensible, as (1) the slave lags behind the
master, but (2) even the master may not be up to date, since page link
updates take place through the job queue.

This change sets up a cache to hold countable values for pages where import
activity has already occurred. That way, we aren't hitting the DB on every
revision, only to get an incorrect response back.

Bug: T42009
Change-Id: I99189c82672d7790cda5036b6aa9883ce6e566b0

Event Timeline

The newly added code in "finishImportPage" (lines 368 to 387) slows down imports massively: Using "maintenance/importDump.php" we see 8..11 Revisions/s imported instead of about 250 Revisions/s. That's more than 20 times slower!