Page MenuHomePhabricator

Reduce Flow DB queries on Special:Contributions
Open, MediumPublic

Description

I looked through the 861 (!) SQL queries that are issued when I visit Special:Contributions/Catrope on my localhost (39 Flow lines out of 50 total), and found the following excessive queries:

  • 118 calls (3 per rendered row?) calls to SELECT * FROM flow_tree_revision JOIN flow_revision rev ON ((tree_rev_id = rev_id)) WHERE rev_type_id = 'X' AND rev_type = 'Y' ORDER BY rev_id DESC LIMIT 100 is called once for every rendered revision and for every current revision by RevisionStorage, called indirectly from AbstractQuery::loadMetadataBatch(). This query occurs 118 times on my localhost (probably 3 queries per rendered row?).
  • 85 calls to SELECT page_namespace,page_title,page_id,page_len,page_is_redirect,page_latest,page_content_model FROM page WHERE page_id = 'N' LIMIT 1, even many times for the same page. These calls come from $result->workflow->isDeleted() (which calls Title::newFromID()) in the loop in ContributionsQuery::getResults(). The problem appears to be that while there is an existence cache for titles by name, there isn't one by ID.
  • While formatting each row (RevisionFormatter::formatApi()), RevisionActionPermissions::isAllowed() is called. This results in several queries:
    • SELECT * FROM flow_tree_revision JOIN flow_revision rev ON ((tree_rev_id = rev_id)) WHERE rev_type_id = 'X' AND rev_type = 'Y' ORDER BY rev_id DESC LIMIT 100 from `RevisionActionPermissions::getRoot()
    • Sometimes that same query but with a different type_id is also called from CollectionCache::getLastRevisionFor(), but this doesn't happen for every row
    • SELECT * FROM flow_topic_list WHERE topic_id = 'X' LIMIT 1 from AbstractCollection::getBoardWorkflow()
    • SELECT page_namespace,page_title,page_id,page_len,page_is_redirect,page_latest,page_content_model FROM page WHERE page_id = 'N' LIMIT 1 from Workflow::isDeleted()
  • RevisionFormatter::buildActions() then proceeds to call Workflow::isDeleted() a bunch more times (indirectly, , resulting in another ~33 Title::newFromId() queries

More stuff happens after that that I didn't bother to narrate. In total, I saw 337 Title::newFromId queries, 266 RevisionStorage::findInternal queries and 187 BasicDbStorage::find (flow_topic_list) queries, accounting for 790 of the 861 queries on this page. In production, this isn't as much of a problem because of the indexes, but we're talking about getting rid of those. With proper batching and in-process caching, we should be able to eliminate hundreds of queries.

Event Timeline

Catrope raised the priority of this task from to Needs Triage.
Catrope updated the task description. (Show Details)
Catrope added subscribers: Catrope, matthiasmullie.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald Transcript

I think most of those queries are caused by Flow permission checks.

RevisionActionPermissions::isAllowed used to only check the given revision, and that was easily implemented.
Then we also had to check the previous revision. Never really refactored that, since it's usually already in memory anyway.
Then we also needed to start checking root revision. Then look if the page is deleted. ...

All of these are still somewhat manageable on regular pages, but adds up quickly when iterating over a lot of revisions (like contributions)

At some point, permissions will need to be properly refactored (so far, it's mostly been ignored because other permission-related drastic changes for RevDel & granular suppression are coming)

From my numbers it looks like we could fix almost half of this by caching the existence of page IDs (for the Title::newFromId( $id ) !== null check in isDeleted()), and prepopulating that cache with a single batch query. 337 of the 790 queries I saw were newFromId queries, so that should have a big impact.

But maybe I'm missing the forest for the trees and there are higher-level changes we would make that would let us check deleted-ness of workflows less often?

Agree with Matthias we should save bigger changes for when we go to implement the major permission changes.

The caching idea makes sense. Maybe core has something for this. I didn't see it at a quick glance, though (the main ones are oriented around title->exists caching). If not, easy to implement in Flow.

Catrope triaged this task as High priority.Dec 19 2015, 1:33 AM
Catrope lowered the priority of this task from High to Medium.