Page MenuHomePhabricator

Pywikibot fails to access Wikidata due to high maxlag lately
Closed, ResolvedPublic

Description

Pywikibot accepts API maxlag. If maxlag is above 5 s threshold (common to all Wikimedia wikis), Pywikibot waits with requests until maxlag drops. Average Wikidata API maxlag has been 650 ms (similar to any other Wikimedia wiki), so this has never been an issue until now.

  1. On 20th Nov, 2019 Wikidata API maxlag jumped from average 650 ms to average 2.2 s. This happened possibly due to T221774: Add Wikidata query service lag to Wikidata maxlag. Since then maxlag was repeatedly getting close to 5 s threshold. It was still fine though, as the 5 s threshold has been exceeded only few times.
  2. On 20th Jan, 2020 Wikidata API maxlag jumped once again from average 2.2 s to average 4.9 s. The cause of the jump is still unknown to this date. The 5 s threshold has been exceeded every day since then. This causes several issues to occur during peak time:
    1. users are unable to use Pywikibot with Wikidata: T242081#5853949, T244030
    2. Pywikibot tests are failing with Timeout: https://travis-ci.org/wikimedia/pywikibot/jobs/633377671#L3591
    3. Pywikibot tests are failing with EntityTypeUnknownException (which seems to be a malformed cache because of Timeout on cache's creation): T242083 (https://travis-ci.org/wikimedia/pywikibot/jobs/633377685#L1496) and also some other issues (login issues)

The Upstream issue is T243701

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
This comment was removed by Dvorapa.

travis creates a user-config.py file. We could add

- echo "maxlag = 7" >> ~/.pywikibot/user-config.py

but I've no idea yet for Appveyor.

Better we could increase this number for wd only.

Another way would be to use -maxlag:7 option when invoking the bot but I don't see such command with tests.

Another way would be to increase the maxlag in api _add_defaults but this would override the config settings and options. Therefore I am not a friend of this idea.

Please do not use a longer maxlag than 5 seconds. It would make issues for the database in the long run. Even if it's just to make tests work. Is it possible to make tests more like unit testy? Less dependency on calling the infra? If not, I rather wait until the issue of WDQS is fixed or reduce the factor of wdqs for maxlag. There's a ticket for the WDQS maxlag issue already.

I started a discussion in T243701. This is what we came to:

For the specific issue you are facing, I may be suggest to review SLA expectations about the api (any of it)- and timing out and erroring quickly rather than waiting in case of lag for non-interactive tests.

Yeah, there are two options: either make them fail earlier, or make them not erroring the whole test suite because of their inresponsiveness. Or both...

This comment was removed by Dvorapa.

Please do not use a longer maxlag than 5 seconds. It would make issues for the database in the long run. Even if it's just to make tests work. Is it possible to make tests more like unit testy? Less dependency on calling the infra? If not, I rather wait until the issue of WDQS is fixed or reduce the factor of wdqs for maxlag. There's a ticket for the WDQS maxlag issue already.

A higher value other than 5 seconds in "retry_after" of the request result would also help to increase the wait time until the next request will tried.

Change 567492 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [IMPR] Timeout retries due to maxlag

https://gerrit.wikimedia.org/r/567492

With 567492 I propose to raise a TimeoutError exception. Most tests will accept this exception and skipping the test.

Change 567492 merged by jenkins-bot:
[pywikibot/core@master] [IMPR] Timeout retries due to maxlag

https://gerrit.wikimedia.org/r/567492

15-20 tests per environment failing with timeout now in Travis. Should we skip/xfail these tests (make them green)? Most of the time this is not an issue with Pywikibot.

Could we also find out whether the timeout is caused by server lag and skip/xfail depending on this?

15-20 tests per environment failing with timeout now in Travis. Should we skip/xfail these tests (make them green)? Most of the time this is not an issue with Pywikibot.

Could we also find out whether the timeout is caused by server lag and skip/xfail depending on this?

@Xqt?

Change 569277 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [tests] skip tests if they failed due to maxlag timeout

https://gerrit.wikimedia.org/r/569277

$ python pwb.py shell
Welcome to the Pywikibot interactive shell!
>>> s=pywikibot.Site('wikidata', 'wikidata')
>>> s
DataSite("wikidata", "wikidata")
>>> s.namespaces
Sleeping for 5.0 seconds, 2020-02-05 22:13:51
Sleeping for 5.0 seconds, 2020-02-05 22:13:56
Sleeping for 5.0 seconds, 2020-02-05 22:14:01
Sleeping for 5.0 seconds, 2020-02-05 22:14:07
Sleeping for 6.0 seconds, 2020-02-05 22:14:12
Sleeping for 7.2 seconds, 2020-02-05 22:14:19
Sleeping for 8.4 seconds, 2020-02-05 22:14:26
Sleeping for 9.6 seconds, 2020-02-05 22:14:35
Sleeping for 10.8 seconds, 2020-02-05 22:14:45
Sleeping for 12.0 seconds, 2020-02-05 22:14:56
Sleeping for 13.2 seconds, 2020-02-05 22:15:08
Sleeping for 14.4 seconds, 2020-02-05 22:15:21
Sleeping for 15.6 seconds, 2020-02-05 22:15:36
Sleeping for 16.8 seconds, 2020-02-05 22:15:52
Sleeping for 18.0 seconds, 2020-02-05 22:16:09
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/pavel/pywikibot/pywikibot/site.py", line 1019, in namespaces
    self._namespaces = NamespacesDict(self._build_namespaces())
  File "/home/pavel/pywikibot/pywikibot/site.py", line 2618, in _build_namespaces
    for nsdata in self.siteinfo.get('namespaces', cache=False).values():
  File "/home/pavel/pywikibot/pywikibot/site.py", line 1660, in get
    preloaded = self._get_general(key, expiry)
  File "/home/pavel/pywikibot/pywikibot/site.py", line 1606, in _get_general
    default_info = self._get_siteinfo(props, expiry)
  File "/home/pavel/pywikibot/pywikibot/site.py", line 1529, in _get_siteinfo
    data = request.submit()
  File "/home/pavel/pywikibot/pywikibot/data/api.py", line 2243, in submit
    self._data = super(CachedRequest, self).submit()
  File "/home/pavel/pywikibot/pywikibot/data/api.py", line 2089, in submit
    raise TimeoutError(
pywikibot.exceptions.TimeoutError: Maximum retries attempted due to maxlag without success.

LOL

LOL

A known upstream issue. can’t laugh about it anymore.

I'm not laughing, I'm crying as it means Pywikibot is sometimes useless with Wikidata API

Dvorapa renamed this task from Random tests fail with timeout to Pywikibot fails to access Wikidata due to high maxlag.Feb 5 2020, 11:33 PM
Dvorapa updated the task description. (Show Details)
Dvorapa updated the task description. (Show Details)
Dvorapa renamed this task from Pywikibot fails to access Wikidata due to high maxlag to Pywikibot fails to access Wikidata due to high maxlag lately.Feb 5 2020, 11:44 PM
Dvorapa updated the task description. (Show Details)
Dvorapa updated the task description. (Show Details)
Dvorapa updated the task description. (Show Details)

Can't we merge https://gerrit.wikimedia.org/r/#/c/pywikibot/core/+/569277/ until the upstream issue is solved or we have a common strategy how to use maxlag at wikidata to pass our tests. A way would be to increase maxlag parameter gradually after each wait cycle; the wait cycle is increasing already with the previous patch.

Change 576813 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [bugfix] Increase maxlag parameter after each wait cycle

https://gerrit.wikimedia.org/r/576813

Change 569277 merged by jenkins-bot:
[pywikibot/core@master] [tests] skip tests if they failed due to maxlag timeout

https://gerrit.wikimedia.org/r/569277

the problem is not on pywikibot side but rather on Wikidata side. Wikidata API is not able to handle the traffic.

the problem is not on pywikibot side but rather on Wikidata side. Wikidata API is not able to handle the traffic.

I made several proposals how to solve it on pywikibot side. Our tests shows that bots cannot work on wikidata currently and if the do I assume they don't care about maxlag=5 parameter a lot.

I would personally either use maxlag as waittime (3 s server lag > wait 3 s, 20 s server lag > wait 20 s), but this is against the maxlag usage recommendations from Wikimedia; or I would avoid maxlag on read API actions, as was suggested by several WD developers in the task.

I would personally either use maxlag as waittime (3 s server lag > wait 3 s, 20 s server lag > wait 20 s),

We had this previously but this was changed to use retry_after instead doe to T144023.
But retry_after is always 5 seconds and imho this is not sufficient for throttleling. Therefore I reimplemented the maxlag to be taken into account for the wait cycle.

$ python pwb.py shell
Welcome to the Pywikibot interactive shell!
>>> s=pywikibot.Site('wikidata', 'wikidata')
>>> s
DataSite("wikidata", "wikidata")
>>> s.namespaces
Sleeping for 5.0 seconds, 2020-02-05 22:13:51
Sleeping for 5.0 seconds, 2020-02-05 22:13:56
Sleeping for 5.0 seconds, 2020-02-05 22:14:01
Sleeping for 5.0 seconds, 2020-02-05 22:14:07
Sleeping for 6.0 seconds, 2020-02-05 22:14:12
Sleeping for 7.2 seconds, 2020-02-05 22:14:19
Sleeping for 8.4 seconds, 2020-02-05 22:14:26
Sleeping for 9.6 seconds, 2020-02-05 22:14:35
Sleeping for 10.8 seconds, 2020-02-05 22:14:45
Sleeping for 12.0 seconds, 2020-02-05 22:14:56
Sleeping for 13.2 seconds, 2020-02-05 22:15:08
Sleeping for 14.4 seconds, 2020-02-05 22:15:21
Sleeping for 15.6 seconds, 2020-02-05 22:15:36
Sleeping for 16.8 seconds, 2020-02-05 22:15:52
Sleeping for 18.0 seconds, 2020-02-05 22:16:09
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/pavel/pywikibot/pywikibot/site.py", line 1019, in namespaces
    self._namespaces = NamespacesDict(self._build_namespaces())
  File "/home/pavel/pywikibot/pywikibot/site.py", line 2618, in _build_namespaces
    for nsdata in self.siteinfo.get('namespaces', cache=False).values():
  File "/home/pavel/pywikibot/pywikibot/site.py", line 1660, in get
    preloaded = self._get_general(key, expiry)
  File "/home/pavel/pywikibot/pywikibot/site.py", line 1606, in _get_general
    default_info = self._get_siteinfo(props, expiry)
  File "/home/pavel/pywikibot/pywikibot/site.py", line 1529, in _get_siteinfo
    data = request.submit()
  File "/home/pavel/pywikibot/pywikibot/data/api.py", line 2243, in submit
    self._data = super(CachedRequest, self).submit()
  File "/home/pavel/pywikibot/pywikibot/data/api.py", line 2089, in submit
    raise TimeoutError(
pywikibot.exceptions.TimeoutError: Maximum retries attempted due to maxlag without success.

LOL

Please, make Pywikibot retries infinitely by default unless in unit tests. Such a setting (with max_retries) will break many bots.

Please, make Pywikibot retries infinitely by default unless in unit tests

You may set the max_retries parameter in your user-config.py to increase the maximum of retries. See also T249622.

But for Wikidata maxlag issue a higher number of retry. For example for a lag of one hour we need 180-720 retries (this is only an extreme situation).

Any update on this? MaxlagTimeoutError is continuously appearing, is there any temporary solution?

Pywikibot should allow infinite retry on maxlag timeout (unless in unit tests). At least the default number of retries should be increased to 1000.

Pywikibot should allow infinite retry on maxlag timeout (unless in unit tests). At least the default number of retries should be increased to 1000.

As noted above the max_retries parameter can be set in user_config.py. To have a different max_retries parameter for several scripts, you may use it as global parameter as the documentation says:

-<config var>:n   You may use all given numeric config variables as option and
                  modify it with command line.

set it like:

pwb.py <your script> -max_retries:1000

This only works if <your script> calls pywikibot.handle_args(). If it does not you can use the pwb.py wrapper to set global args:

pwb.py -max_retries:1000 <your script>

Probably T254435 must be solved first for it for some cases.

Any update on this? MaxlagTimeoutError is continuously appearing, is there any temporary solution?

A proposal was made with https://gerrit.wikimedia.org/r/#/c/pywikibot/core/+/576813/ on pywikibot side.

I, in my volunteer capacity and out of frustration, requested bot flag of two bots that each edited 50K in the past 12 hours (= 2 per second) to be revoked. Out of interest of anyone who wants to participate: https://www.wikidata.org/wiki/Wikidata:Administrators%27_noticeboard#Request_to_revoke_bot_flags_of_Edoderoobot_and_LargeDatasetBot

The purpose of checking maxlag is to slow the rate of EDITS to Wikidata. I don't understand why Pywikibot is using it as a reason not to READ data. There are surely a vast number of other applications out there that read from Wikidata (and query WDQS) without checking maxlag!

Change 576813 abandoned by Xqt:
[pywikibot/core@master] [bugfix] Increase maxlag parameter after each wait cycle

Reason:
Obviously declined. Solved upstream.

https://gerrit.wikimedia.org/r/576813

Change 635953 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [tests] Do not skip tests due to MaxlagTimeoutError

https://gerrit.wikimedia.org/r/635953

Change 635953 merged by jenkins-bot:
[pywikibot/core@master] [tests] Do not skip tests due to MaxlagTimeoutError

https://gerrit.wikimedia.org/r/635953

Change 678580 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@master] [tests] Revert: "Do not skip tests due to MaxlagTimeoutError"

https://gerrit.wikimedia.org/r/678580

Change 678580 merged by jenkins-bot:

[pywikibot/core@master] [tests] Revert: "Do not skip tests due to MaxlagTimeoutError"

https://gerrit.wikimedia.org/r/678580