Page MenuHomePhabricator

Improve performance of dispatchChanges::getPendingChanges
Closed, ResolvedPublic


dispatchChanges has performance issues. One major bottle neck is the getPendingChanges() function. It works be loading a block of changes, then for the item of each change loads the sitelinks, then check whether the target wiki is mentioned in the sitelinks. This means one extra database query for each change (per default, 1000 per batch). This is far too slow.

One solution would be to join the wb_changes table against the wb_items_per_site table directly. This however would no longer work when we have client side usage tracking. Also, wb_changes uses a single field for the prefixed ID of the entity, while wb_items_per_site uses one field for the entity type and one for the numeric ID. This makes joining inefficient and inconvenient.

An alternative solution would be to provide a storage layer service for
a) checking for a given client wiki which items from a given list are used there.
b) provides all pages on a given client wiki that use one of a list of items.
Using the first method, we could filter a given block of changes using a single query.

Version: unspecified
Severity: normal



Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 1:29 AM
bzimport set Reference to bz47125.
bzimport added a subscriber: Unknown Object (MLST).

My original description of the problem is incorrect so far as the filterChanges() function used by getPendingChanges() does already only query the sitelinks table once, not for each change.

However, it remains true that getPendingChanges() is a bottle neck. Possible improvements include caching and optimized code flow.

Related URL: (Gerrit Change Idc7def15a5bd113b2cf38f8140f26098848bc1a7) (Gerrit Change Idc7def15a5bd113b2cf38f8140f26098848bc1a7) | change APPROVED and MERGED [by Aude] (Gerrit Change I677d5fe46fcd7cf565443aa581f69e73c28fa940) | change APPROVED and MERGED [by Aude]