In T137707#2379997, @jcrespo wrote:For the API part, I would like to add that API infrastructure (application servers and databases) is specifically prepared to be separated from non-api traffic and better ready for mass requests than regular browser queries, so that both cannot interfere each other. It produces information in nice JSON format, that you can parse with any json decoder, (or even a regex!), with little to no performance loss.
If you think that the API is non performant (both the action API or the restbase one), please send a bug and we will look at it.
We can discuss more or less usage of the API, but not using the API for API-like requests is definitely not OK. From https://www.mediawiki.org/wiki/API:Etiquette :
There is no hard and fast limit on read requests, but we ask that you be considerate and try not to take a site down. Most sysadmins reserve the right to unceremoniously block you if you do endanger the stability of their site.
- Feed Queries
- All Stories
- Search
- Feed Search
- Transactions
- Transaction Logs
Jul 7 2016
Jun 15 2016
In T137707#2380275, @jcrespo wrote:BTW, the API is definitely faster, one just need to use it efficiently:
$ time curl 'https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=January|February|March|April|May|June|July|August|September' > /dev/null real 0m0.717s user 0m0.004s sys 0m0.004s $ time (curl 'https://en.wikipedia.org/w/index.php?action=raw&title=January' && curl 'https://en.wikipedia.org/w/index.php?action=raw&title=February' && curl 'https://en.wikipedia.org/w/index.php?action=raw&title=March' && curl 'https://en.wikipedia.org/w/index.php?action=raw&title=April' && curl 'https://en.wikipedia.org/w/index.php?action=raw&title=May' && curl 'https://en.wikipedia.org/w/index.php?action=raw&title=June' && curl 'https://en.wikipedia.org/w/index.php?action=raw&title=July' && curl 'https://en.wikipedia.org/w/index.php?action=raw&title=September' ) > /dev/null real 0m3.654s user 0m0.024s sys 0m0.008s
Jun 14 2016
I don't think api.php?action=query&prop=revisions&rvprop=content can be the same performant as index.php?action=raw, and the latter is the easiest way to get the source code of a page. I would appreciate it if there was a way to perform api.php?action=raw.
Also, there doesn't exist a clear request rate limit for mediawiki api, as[[T135240| the rest api]] does. If you want to set one, you should document it.
Most of my tasks don't generate such " unacceptable amount of traffic". They usually send a few hundred to thousand requests before exit. But they still need a way to bypass the TLS redirect.
If you don't give me a good reason why cp1008.wikimedia.org:3128 / index.php?action=raw shouldn't be used, I will start some of my jobs that don't involve mass page content fetching, such as projectstat.
Labs replicas can't do that job, as revision tables are removed on such databases. Dumps are not updated such often.
Jun 13 2016
My bot was using /w/index.php?action=raw to fetch the content of each page/redirect at zhwiki, then it will do some simple search/replace/template addition work.
Jun 2 2016
I could reduce the concurrency by lowering the number of threads in the pool. (Current is 50.) But what if another bot task running on the same node exceeds the rate limit?
The rate limiting is breaking my bot.
May 2 2016
As reported by User:Kanashimi, some api query output is broken, either. For example, https://zh.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content|timestamp&titles=LGBT%E7%9B%B8%E5%85%B3%E7%94%B5%E5%BD%B1%E5%88%97%E8%A1%A8&rvlimit=1&format=json&utf8 returns unnecessary "w6" at the end.
Apr 24 2016
I'm still seeing these problems: