Page MenuHomePhabricator

Pywikibot fails to access Wikidata due to high maxlag lately
Open, HighPublic

Description

Pywikibot accepts API maxlag. If maxlag is above 5 s threshold (common to all Wikimedia wikis), Pywikibot waits with requests until maxlag drops. Average Wikidata API maxlag has been 650 ms (similar to any other Wikimedia wiki), so this has never been an issue until now.

  1. On 20th Nov, 2019 Wikidata API maxlag jumped from average 650 ms to average 2.2 s. This happened possibly due to T221774: Add Wikidata query service lag to Wikidata maxlag. Since then maxlag was repeatedly getting close to 5 s threshold. It was still fine though, as the 5 s threshold has been exceeded only few times.
  2. On 20th Jan, 2020 Wikidata API maxlag jumped once again from average 2.2 s to average 4.9 s. The cause of the jump is still unknown to this date. The 5 s threshold has been exceeded every day since then. This causes several issues to occur during peak time:
    1. users are unable to use Pywikibot with Wikidata: T242081#5853949, T244030
    2. Pywikibot tests are failing with Timeout: https://travis-ci.org/wikimedia/pywikibot/jobs/633377671#L3591
    3. Pywikibot tests are failing with EntityTypeUnknownException (which seems to be a malformed cache because of Timeout on cache's creation): T242083 (https://travis-ci.org/wikimedia/pywikibot/jobs/633377685#L1496)

The Upstream issue is T243701

Details

Related Objects

StatusSubtypeAssignedTask
ResolvedDalba
ResolvedDalba
OpenNone
OpenNone
OpenNone
DeclinedNone
ResolvedXqt
DeclinedNone
ResolvedXqt
ResolvedXqt
OpenGoalNone
OpenNone
OpenNone
OpenNone
ResolvedXqt
ResolvedXqt
ResolvedXqt
DeclinedNone
InvalidNone
ResolvedUrbanecm
OpenNone
ResolvedXqt
OpenNone
Resolvedjayvdb
DeclinedNone
DuplicateNone
ResolvedDvorapa
ResolvedXqt
OpenNone
OpenNone
Resolvedhashar
ResolvedMpaa
ResolvedAndrew
OpenNone
OpenNone
ResolvedDvorapa
OpenNone
ResolvedDvorapa
ResolvedDvorapa
OpenNone
OpenNone
OpenXqt

Event Timeline

Dvorapa created this task.Jan 7 2020, 9:22 AM
Restricted Application added a project: Pywikibot. · View Herald TranscriptJan 7 2020, 9:22 AM
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald Transcript
Xqt triaged this task as High priority.Jan 7 2020, 9:27 AM

(fails with a timeout)

Dvorapa added a comment.EditedJan 7 2020, 9:40 AM

It seems this is caused by database lag? The same as https://travis-ci.org/wikimedia/pywikibot/jobs/633377689#L1346 and https://travis-ci.org/wikimedia/pywikibot/jobs/633377688#L3644 for example? Or is it a different issue and the database lag is only a side thing?

This seems to fail randomly on tests

Dvorapa renamed this task from test_iw_bot fails on fa:کاربر:Ladsgroup to Random tests fail with timeout.Jan 21 2020, 10:32 AM
Dvorapa updated the task description. (Show Details)

Also it might have something in common with login issues?

Xqt added subscribers: Strainu, Ladsgroup.
Xqt added a subscriber: Xqt.Jan 26 2020, 7:01 PM

I guess that is the same issue discussed in this thread:

https://lists.wikimedia.org/pipermail/pywikibot/2020-January/009977.html

I remember I discussed this Wikidata lag with someone on IRC during Christmas holidays and surely the lag seems to be high (4s randomly, up to 8s sometimes) https://grafana.wikimedia.org/d/000000156/wikidata-dispatch?orgId=1&refresh=1m&fullscreen&panelId=22&from=now-30d&to=now

Xqt added a comment.Jan 27 2020, 10:40 AM

See also T221774 where the same problem occured with maxlag parameter which is 5 seconds per default.

This comment was removed by Dvorapa.
Xqt added a comment.Jan 27 2020, 4:50 PM

travis creates a user-config.py file. We could add

- echo "maxlag = 7" >> ~/.pywikibot/user-config.py

but I've no idea yet for Appveyor.

Better we could increase this number for wd only.

Another way would be to use -maxlag:7 option when invoking the bot but I don't see such command with tests.

Another way would be to increase the maxlag in api _add_defaults but this would override the config settings and options. Therefore I am not a friend of this idea.

Please do not use a longer maxlag than 5 seconds. It would make issues for the database in the long run. Even if it's just to make tests work. Is it possible to make tests more like unit testy? Less dependency on calling the infra? If not, I rather wait until the issue of WDQS is fixed or reduce the factor of wdqs for maxlag. There's a ticket for the WDQS maxlag issue already.

I started a discussion in T243701. This is what we came to:

For the specific issue you are facing, I may be suggest to review SLA expectations about the api (any of it)- and timing out and erroring quickly rather than waiting in case of lag for non-interactive tests.

Yeah, there are two options: either make them fail earlier, or make them not erroring the whole test suite because of their inresponsiveness. Or both...

This comment was removed by Dvorapa.
Xqt added a comment.EditedJan 27 2020, 5:19 PM

Please do not use a longer maxlag than 5 seconds. It would make issues for the database in the long run. Even if it's just to make tests work. Is it possible to make tests more like unit testy? Less dependency on calling the infra? If not, I rather wait until the issue of WDQS is fixed or reduce the factor of wdqs for maxlag. There's a ticket for the WDQS maxlag issue already.

A higher value other than 5 seconds in "retry_after" of the request result would also help to increase the wait time until the next request will tried.

Change 567492 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [IMPR] Timeout retries due to maxlag

https://gerrit.wikimedia.org/r/567492

Xqt added a comment.Jan 27 2020, 5:54 PM

With 567492 I propose to raise a TimeoutError exception. Most tests will accept this exception and skipping the test.

Change 567492 merged by jenkins-bot:
[pywikibot/core@master] [IMPR] Timeout retries due to maxlag

https://gerrit.wikimedia.org/r/567492

Dvorapa added a comment.EditedJan 27 2020, 8:49 PM

15-20 tests per environment failing with timeout now in Travis. Should we skip/xfail these tests (make them green)? Most of the time this is not an issue with Pywikibot.

Could we also find out whether the timeout is caused by server lag and skip/xfail depending on this?

15-20 tests per environment failing with timeout now in Travis. Should we skip/xfail these tests (make them green)? Most of the time this is not an issue with Pywikibot.
Could we also find out whether the timeout is caused by server lag and skip/xfail depending on this?

@Xqt?

Xqt claimed this task.Jan 31 2020, 5:29 PM

Change 569277 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [tests] skip tests if they failed due to maxlag timeout

https://gerrit.wikimedia.org/r/569277

$ python pwb.py shell
Welcome to the Pywikibot interactive shell!
>>> s=pywikibot.Site('wikidata', 'wikidata')
>>> s
DataSite("wikidata", "wikidata")
>>> s.namespaces
Sleeping for 5.0 seconds, 2020-02-05 22:13:51
Sleeping for 5.0 seconds, 2020-02-05 22:13:56
Sleeping for 5.0 seconds, 2020-02-05 22:14:01
Sleeping for 5.0 seconds, 2020-02-05 22:14:07
Sleeping for 6.0 seconds, 2020-02-05 22:14:12
Sleeping for 7.2 seconds, 2020-02-05 22:14:19
Sleeping for 8.4 seconds, 2020-02-05 22:14:26
Sleeping for 9.6 seconds, 2020-02-05 22:14:35
Sleeping for 10.8 seconds, 2020-02-05 22:14:45
Sleeping for 12.0 seconds, 2020-02-05 22:14:56
Sleeping for 13.2 seconds, 2020-02-05 22:15:08
Sleeping for 14.4 seconds, 2020-02-05 22:15:21
Sleeping for 15.6 seconds, 2020-02-05 22:15:36
Sleeping for 16.8 seconds, 2020-02-05 22:15:52
Sleeping for 18.0 seconds, 2020-02-05 22:16:09
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/pavel/pywikibot/pywikibot/site.py", line 1019, in namespaces
    self._namespaces = NamespacesDict(self._build_namespaces())
  File "/home/pavel/pywikibot/pywikibot/site.py", line 2618, in _build_namespaces
    for nsdata in self.siteinfo.get('namespaces', cache=False).values():
  File "/home/pavel/pywikibot/pywikibot/site.py", line 1660, in get
    preloaded = self._get_general(key, expiry)
  File "/home/pavel/pywikibot/pywikibot/site.py", line 1606, in _get_general
    default_info = self._get_siteinfo(props, expiry)
  File "/home/pavel/pywikibot/pywikibot/site.py", line 1529, in _get_siteinfo
    data = request.submit()
  File "/home/pavel/pywikibot/pywikibot/data/api.py", line 2243, in submit
    self._data = super(CachedRequest, self).submit()
  File "/home/pavel/pywikibot/pywikibot/data/api.py", line 2089, in submit
    raise TimeoutError(
pywikibot.exceptions.TimeoutError: Maximum retries attempted due to maxlag without success.

LOL

Xqt added a comment.Feb 5 2020, 10:43 PM

LOL

A known upstream issue. can’t laugh about it anymore.

I'm not laughing, I'm crying as it means Pywikibot is sometimes useless with Wikidata API

Dvorapa renamed this task from Random tests fail with timeout to Pywikibot fails to access Wikidata due to high maxlag.Feb 5 2020, 11:33 PM
Dvorapa updated the task description. (Show Details)
Dvorapa updated the task description. (Show Details)
Dvorapa updated the task description. (Show Details)Feb 5 2020, 11:35 PM
Dvorapa renamed this task from Pywikibot fails to access Wikidata due to high maxlag to Pywikibot fails to access Wikidata due to high maxlag lately.Feb 5 2020, 11:44 PM
Dvorapa updated the task description. (Show Details)
Zoranzoki21 moved this task from Backlog to Needs Review on the Pywikibot board.Feb 8 2020, 5:00 PM
Dvorapa updated the task description. (Show Details)Feb 10 2020, 1:05 PM
Dvorapa updated the task description. (Show Details)Feb 10 2020, 1:12 PM
Dvorapa updated the task description. (Show Details)Feb 10 2020, 1:25 PM
Dvorapa updated the task description. (Show Details)Feb 10 2020, 1:39 PM
Dvorapa updated the task description. (Show Details)
Dvorapa updated the task description. (Show Details)
Dvorapa updated the task description. (Show Details)Feb 10 2020, 1:42 PM
Xqt added a comment.Feb 13 2020, 9:56 AM

Can't we merge https://gerrit.wikimedia.org/r/#/c/pywikibot/core/+/569277/ until the upstream issue is solved or we have a common strategy how to use maxlag at wikidata to pass our tests. A way would be to increase maxlag parameter gradually after each wait cycle; the wait cycle is increasing already with the previous patch.

Change 576813 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [bugfix] Increase maxlag parameter after each wait cycle

https://gerrit.wikimedia.org/r/576813

Change 569277 merged by jenkins-bot:
[pywikibot/core@master] [tests] skip tests if they failed due to maxlag timeout

https://gerrit.wikimedia.org/r/569277

THE_IT added a subscriber: THE_IT.Wed, Apr 1, 6:50 AM
Dvorapa moved this task from Backlog to Reported Upstream on the Upstream board.Sun, Apr 5, 9:18 AM