Page MenuHomePhabricator

Optimize SpecialAllPages::showChunk for large wikis
Open, LowPublic

Description

Related incident: https://wikitech.wikimedia.org/wiki/Incident_documentation/2017-03-20_Special_AllPages

Background

T160914: Databases overflown with connections due to slow query on Special:AllPages

The query in question looks something like SELECT page_title FROM page WHERE page_namespace = '0' AND page_is_redirect = '0' AND (page_title < 'Entomacrodus_stellifer') ORDER BY page_title DESC LIMIT 344, 1

There are at least two faults with the plan the database is currently using for this query:

  • It's fetching all (or almost all) pages in the namespace, then filtering by title and is_redirect, rather than fetching pages in order using the index on namespace+title and filtering by is_redirect.
  • It's not even using ICP to do the filtering by title, which a similar query using ascending order does do.

But even with the best plan, this query would still be problematic when run on a namespace with millions of redirects and few non-redirects.

To discuss

Adding a filter on is_redirect may be possible on the largest wikis (it is currently disabled for them, but maybe it can be enabled).

Options:

  • Add an index including the flag (is is ok to add an index just for a single, infrequent query?)
  • Rewrite the query (how?)
  • Force the optimizer to do a better plan (probably not enough)
  • Anything else?

It may require more than one option.

Event Timeline

Anomie subscribed.

(is is ok to add an index just for a single, infrequent query?)

As long as it's ok with the DBAs, sure. ;)

Rewrite the query (how?)

The query here is asking "If we were to go back one page's worth of results, what's the first title on that page?". I don't know of another way to do that besides in some manner the query we're already using.

There's also the very similar query "What are the N titles before X?" that's also subject to the same issue.

Force the optimizer to do a better plan (probably not enough)

It would have prevented the recent incident in T160916, I believe. But it wouldn't prevent a similar problem in the case of a namespace with millions of redirects and few-or-no non-redirects.

As long as it's ok with the DBAs, sure. ;)

The DBAs are probably not sure if it is worth it. What it is gained and lost directly with every index not only is difficult to account for, it can also have consequences on other query plans (in the form of regressions)- something shown on T159319, which I believe is partially due to too many indexes that can be used.

If we were to go back one page's worth of results, what's the first title on that page?". I don't know of another way to do that besides in some manner the query we're already using.

I am not saying it has to be like that, but I can think many ways to do that, slightly breaking behaviour. A different story is if we can/want to break compatibility. :-)

To be fair, I would try a new index first, something along the lines of (is_redirect, namespace, title)-non necessarily on that order- on a pasive enwiki slave and see what are the different query plans obtained. No need to work on theoretical & policy stuff, if the obvious technical solution doesn't even work. I do not like discussing without having some proof of concept first.

In an ideal world, we would have a redirects table- it could even be just a pointer to the page table. That could help other operations too.

If the index would work only when filtering is done, we could even do 2 queries if that was easier.

I personally cannot give this high priority because I believe there can be other cases similar to T160916 that are still reachable by users -even by accident-, and while I would love to recover the full functionality of AllPages, I have to give higher priority to cases that will put the site down. Many people expressed their willingness to help on database optimization on the latest Mediawiki conference- I am going to ping those so we are not alone on this ticket :-)

In an ideal world, we would have a redirects table- it could even be just a pointer to the page table. That could help other operations too.

We do have a redirects table.

Nice, I didn't realize that! Still you wouldn't know how to rewrite the query in the case of is_redirect=1 :-P ? Index on rd_namespace,rd_title seems useful :-)

Actually, that wouldn't work, it is the target that is indexed, not the origin. We would need a different redirect table....

rd_from is the pointer to the page table. The rest of the columns are about the target of the redirect.

But even with the best plan, this query would still be problematic when run on a namespace with millions of redirects and few non-redirects.

It seems very unlikely that such a namespace would ever exist. Large consecutive blocks of redirects are possible (for example titles starting with a number in parantheses which is a shorthand notation used for various astronomy and chemistry stuff) but probably not in the millions.

OTOH the API also allows page_is_redirect = 1 queries which could be more problematic as millions of non-redirects but few redirects is probably the case for e.g. the User_talk namespace on wikis with automated welcome messages.

1978Gage2001 moved this task from Triage to In progress on the DBA board.

@Ladsgroup thought on this task?

I haven't seen any even slightly slow queries on the logs originating from Special:AllPages so this might have been fixed, e.g. by adding index. Clearly not high priority. I'll dig deeper a bit later.