Page MenuHomePhabricator

inteGraality queries feature is painfully slow, potentially pywikibot related
Closed, ResolvedPublic

Description

inteGraality has never been particularly fast, but overall this is understandable because it does a lot (a NxM SPARQL queries [with N=#groupings and M=#properties]).

The queries (aka looking glass) feature, however, does comparatively very little: using Pywikibot to read the find the wikipage, read the config from the template, generate hyperlinks to two SPARQL queries, done. Yet, it can be excruciatingly slow (as reported by @Dominicbm among others).

I have reproduced this locally, throwing in some Jaeger based distributed tracing. Here is a trace:

image.png (925×1 px, 160 KB)

Integraality spends less than a millisecond creating the site object, or looking up the data_repository, or creating the pywikibot.Page object… then spends 25 minutes doing page.templatesWithParams(). Admittedly, that page has some 500+ templates − so I tried again on an 'empty' integraality page, with only two templates − 28 minutes.

image.png (858×1 px, 124 KB)

I really don’t see what I could be doing anything wrong here.

Event Timeline

(If anyone is interested in reproducing my tests, I threw this onto a branch https://github.com/JeanFred/inteGraality/tree/jaegerdocker-compose up -d and visiting http://localhost:5000/queries?page=User:Jean-Fr%C3%A9d%C3%A9ric/Red_Pandas&property=P50&grouping=Q3359737 should be enough (the Jaeger UI will be on localhost:16686 )

Do you have some pywikibot-only test case?

Something I forgot to mention which may be relevant: I was using packaged pywikibot 2019.11.28.

The logs of that run are P10293.

Do you have some pywikibot-only test case?

At its core the app does:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import logging

import pywikibot

page_title = "User:Jean-Frédéric/Red Pandas"

logging.info("Creating site object...")
site = pywikibot.Site('en', 'wikipedia')

logging.info("Getting data repo...")
repo = site.data_repository()

logging.info("Getting Page...")
page = pywikibot.Page(repo, page_title)

logging.info("Getting templates...")
all_templates_with_params = page.templatesWithParams()

print(all_templates_with_params)

I ran this with the old pywikibot − P10294. with the more recent 3.0.20200111, it’s quite faster P10295

So this is https://www.mediawiki.org/wiki/Manual:Maxlag_parameter I guess in theory you could increase it to make it more aggressive, but I think it's better to get the lag fixed.

So this is https://www.mediawiki.org/wiki/Manualst:Maxlag_parameter I guess in theory you could increase it to make it more aggressive, but I think it's better to get the lag fixed.

I was under the impression that the purpose of Maxlag was to limit editing (which makes sense, as more edits just make the lag even worse) ; but here all I’m doing is read-access. Or am I missing something here?

Mentioned in SAL (#wikimedia-cloud) [2020-02-03T16:57:19Z] <wm-bot> <jeanfred> Deploy latest from Git master: 3b2867f, 10f67fd, 2e6a669 (T244030), 279c92b, de6308e

So you mean read accesses should ignore maxlag? hmm
MediaWiki does respect maxlag even in a read.

I guess if the MediaWiki API itself does it, then there’s not much to do. I guess my expectations are off, but I do find it a bit crazy to do a pywikibot._code_fam_from_url('https://www.wikidata.org/wiki/Q1') and be throttled for over 20 minutes before getting back ('wikidata', Family("wikidata")) :-)

Trying to chart a path here:

  1. Waiting for the WDQS lag / Wikidata maxlag somehow gets under control T243701: Wikidata maxlag repeatedly over 5s since Jan 20, 2020 (primarily caused by the query service)
  2. Waiting for the Wikidata API to not factor WDQS lag for read-only queries T243701#5834792
  3. Implementing some caching so that reading pages are less painful: dashboards updates would purge the config cache, but looking glass would use it.

Number 3 could be quite good to do.

So this is https://www.mediawiki.org/wiki/Manualst:Maxlag_parameter I guess in theory you could increase it to make it more aggressive, but I think it's better to get the lag fixed.

I was under the impression that the purpose of Maxlag was to limit editing (which makes sense, as more edits just make the lag even worse) ; but here all I’m doing is read-access. Or am I missing something here?

maxlag is a nice way for non-interactive bots to automatically back off if the server is lagged. If this is an interactive tool that humans are waiting on, set maxlag=0 and stop caring about it.

@Dvorapa @Xqt Would you consider removing the limit for accessing the data, since this doesn't have an impact on the WDQS lag? This would be a huge relief for people using tools based on Pywikibot that can be blocked when the WDQS lag is high. Thanks a lot :)

@Lea_Lacroix_WMDE I'm really sorry, but I don't understand your request. Could you read the description of T242081 (there is an explanation, how Pywikibot works and why it does not work with Wikidata currently) and specify, what "limit" do you mean and in which way should we "remove" it? :)

@Lea_Lacroix_WMDE Or better could you please change Manual:Maxlag parameter according to a new policy (we can then adapt Pywikibot's behavior according to changes in manual)

@Dvorapa Hey, What we meant is not a change in the policy, it's about when there are "read" API requests happening. Is that that the case that pywikibot tries to use/enforce maxlag when there are read (wbgetentities, search, query, etc.) API requests? I recommend stop doing it as reads won't affect the replication lag problem (they happen on replicas not the master).

@Dvorapa Hey, What we meant is not a change in the policy, it's about when there are "read" API requests happening. Is that that the case that pywikibot tries to use/enforce maxlag when there are read (wbgetentities, search, query, etc.) API requests? I recommend stop doing it as reads won't affect the replication lag problem (they happen on replicas not the master).

I see, but this seems like a significant change to how Pywikibot works with API, applying maxlag only to write requests, not read. @Xqt?

@Dvorapa Hey, What we meant is not a change in the policy, it's about when there are "read" API requests happening. Is that that the case that pywikibot tries to use/enforce maxlag when there are read (wbgetentities, search, query, etc.) API requests? I recommend stop doing it as reads won't affect the replication lag problem (they happen on replicas not the master).

Also the policy/manual/APi etiquette does not distinguish between read/write request when talking about maxlag. From my point of view, if server is flooded by write requests, flooding it with read requests even if write requests being delayed will not help server much.

Currently the bot works as follows (for read and write request):

  • submit a query with the given maxlag parameter (default is 5) and check it's result
  • if there is no maxlag error code, process in usual way
  • if there is a maxlag error code then do:
  1. wait for many seconds returned by the retry_after header value returned by the http request
  2. unfortunately this retry_after value is always 5 seconds and does not do what one expect, see T210606 and T172293. Therefore read the current lag time from error info. Multiply this value by the number of retries (upto 15 by default) and increase the wait time as showin in the table of T210606 but not longer than 120 seconds
  3. After a maximum retries throw a TimeoutError exception and stop the bot working

What can be done?

  • give an adequate maxlag time on server side if it isn't already
  • solve T172293 and give a better retry_after value to instruct the bot how long it should wait for next retry
  • we can use the current lag for a better estimate for the wait time instead of the retry_after value
  • we can use an increasing wait time by using the current lag instead of the calculated time shown in the table above
  • double the maxlag parameter for read requests
  • with increasing wait time we can also increase the maxlag parameter by 1 second (maybe limited to 9) for each try. For example:
retry maxlag retry_after current lagwait time
15599
2651020
3751133
4851040
5951260
6951060
7958 pass

ot something like that. This would decrease client requests a lot and not retry every 5 seconds but on the other hand increase the change of beeing successful at any time after some wait cycles.

I ran this with the old pywikibot − P10294. with the more recent 3.0.20200111, it’s quite faster P10295

There is no difference in the wait cycle behaviour of these versions; they always waited for 5 seconds due to retry_after value returned from request http header. There is a change in the current master branch which increases the wait time slightly but raises a TimeoutError after a number of retries given by max_retries parameter (in [user-]config.py)

Xqt claimed this task.

Solved upstream with T242081 I guess