Long API requests from ChangeProp
Open, LowPublic

Description

Right now CP has a very long backlog on the continuation topic, because the following page on wiktionary was edited: https://en.wiktionary.org/wiki/Module%3Alanguages%2Fdata2 This page is transcluded in lots and lots of other pages, and an API request takes ages to complete: https://en.wiktionary.org/w/api.php?action=query&prop=transcludedin&titles=Module%3Alanguages%2Fdata2&tilimit=500

The max number of pages we can fetch in a single API request is 500, but 5000 is allowed to the bots. CP should either become a bot to get 5000 items in a single request, or a change should be made in MW API to allow CP get more items in a single request.

Longer term, we need to think how to optimise these API requests even further. Maybe get a MySQL slave and mare SQL queries from CP to MySQL directly?

Pchelolo created this task.Sep 7 2016, 6:46 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 7 2016, 6:46 PM

Longer term, we need to think how to optimise these API requests even further. Maybe get a MySQL slave and mare SQL queries from CP to MySQL directly?

We have considered this in the beginning as one of the options for getting CP quick access to the relevant information, and I think we should go with this. Have a dedicated slave and create a simple interface based on a MySQL node driver to obtain the data.

Even if CP is allowed to get 5000 items in single request, wouldn't it be slow ?
A MySql Replication would be fast.I would like to work on this issue.
Do we have a dedicated server to be used as slave ?

I have spoken about this to our OpsEns. We have concluded that, in fact, having a slave is against the current policy - nothing apart from MW it self accesses the DB directly, so we should refrain from such an approach. Moreover, the DB is divided in 7 to 11 shards (depends on whether we'd need just the core tables or the additional ones as well), which means that if we'd want to have a dedicated MySQL slave, we'd actually need 7 to 11 machines for that, which is simply not feasible.

The consensus is to investigate this particular query and find out why it performs so badly.

The consensus is to investigate this particular query and find out why it performs so badly.

Filed T145079: Investigate slow transcludedin query to that effect.

Given the above, it seems that the way to go is:

The max number of pages we can fetch in a single API request is 500, but 5000 is allowed to the bots. CP should either become a bot to get 5000 items in a single request, or a change should be made in MW API to allow CP get more items in a single request.

This should probably be combined with T137140: Support change propagation for private wikis as we need special rights for both cases.

mobrovac triaged this task as Low priority.Oct 6 2016, 11:56 AM

Setting priority to Low as the imminent problem with long MySQL queries has been fixed.

Pchelolo moved this task from Backlog to later on the Services board.Oct 12 2016, 5:24 PM
Pchelolo edited projects, added Services (later); removed Services.