Page MenuHomePhabricator

Pywikibot reports maxlag retry error
Open, Needs TriagePublic

Description

Error

Connecting to Wikidata using Pywikibot fails with maxlag retry error when servers are under heavy load.

import pywikibot
site = pywikibot.Site('wikidata')

Connecting to other platforms like Commons or Wikipedia works correctly. User owns a bot account.

message
CRITICAL: Maximum retries attempted due to maxlag without success.
Impact
  • Possibly the server connection timeout threshold is too low.
  • Impossible to amend Wikidata using Pywikbot.
    • Using Pywikibot for Wikidata is temporarily impossible.
Notes

Manually editing Wikidata with the GUI works without problem.
Loging in to other platforms like Commons or Wikipedia works correctly.
This problem might be due to a too strict load level restriction for users owning a bot flag.
Maxlag can be monitored through https://grafana.wikimedia.org/d/TUJ0V-0Zk/wikidata-alerts.

Event Timeline

JJMC89 subscribed.

When I attempted to reproduce, it was the site object instantiation, not the login.

>>> import pywikibot
>>> site = pywikibot.Site('wikidata')
Sleeping for 5.0 seconds, 2026-03-29 08:58:00
Sleeping for 5.0 seconds, 2026-03-29 08:58:05
Sleeping for 5.0 seconds, 2026-03-29 08:58:10
Sleeping for 5.0 seconds, 2026-03-29 08:58:16
Sleeping for 5.1 seconds, 2026-03-29 08:58:21
Sleeping for 6.1 seconds, 2026-03-29 08:58:26
Sleeping for 7.1 seconds, 2026-03-29 08:58:32
Sleeping for 8.1 seconds, 2026-03-29 08:58:39
Sleeping for 9.1 seconds, 2026-03-29 08:58:48
Sleeping for 10.1 seconds, 2026-03-29 08:58:56
Sleeping for 13.1 seconds, 2026-03-29 08:59:06
Sleeping for 14.3 seconds, 2026-03-29 08:59:20
Sleeping for 15.5 seconds, 2026-03-29 08:59:33
Sleeping for 16.7 seconds, 2026-03-29 08:59:49
Sleeping for 19.0 seconds, 2026-03-29 09:00:05
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jjmc89/wrk/pwb/pywikibot/__init__.py", line 266, in Site
    _sites[key] = interface(code=code, fam=fam, user=user)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jjmc89/wrk/pwb/pywikibot/site/_datasite.py", line 41, in __init__
    super().__init__(*args, **kwargs)
  File "/home/jjmc89/wrk/pwb/pywikibot/site/_apisite.py", line 136, in __init__
    self.login(cookie_only=True)
  File "/home/jjmc89/wrk/pwb/pywikibot/site/_apisite.py", line 392, in login
    if self.userinfo['name'] == self.user():
       ^^^^^^^^^^^^^
  File "/home/jjmc89/wrk/pwb/pywikibot/site/_apisite.py", line 670, in userinfo
    uidata = uirequest.submit()
             ^^^^^^^^^^^^^^^^^^
  File "/home/jjmc89/wrk/pwb/pywikibot/data/api/_requests.py", line 1223, in submit
    raise MaxlagTimeoutError(msg)
pywikibot.exceptions.MaxlagTimeoutError: Maximum retries attempted due to maxlag without success.

Pywikibot cannot do anything about the server lagging.

JJMC89 changed the subtype of this task from "Production Error" to "Task".Mar 29 2026, 4:08 PM

There is nothing we can do here. Pywikibot is using retry-after from response (usually 5 seconds) and also increases the wait cycle a bit from loop to loop. Humans have priority at times of high load because it is not wanted to waste their time by rejecting their edits.

What you can do is:

  • Increase config.max_retries (default 15) to have more wait cycles before the bot gives up
  • use -maxlag global option (default 5) with a higher value or the corresponding config.maxlag if your task is interactive
  • refer also https://www.mediawiki.org/wiki/Manual:Maxlag_parameter for maxlag parameter proposals which are implemented with Pywikibot
Xqt reopened this task as Open.EditedApr 3 2026, 12:13 PM

Reopened because the maxlag has been extemely high for several days, mostly reported for wdqs1016. The minimum lag is above 6 seconds, causing bots requests always to fail.

image.png (893×1 px, 130 KB)

See also: T243701, T242081

The problems began on March 25th:

image.png (692×1 px, 73 KB)

Xqt renamed this task from Pywikibot login to Wikidata generates maxlag retry error to Pywikibot reports maxlag retry error.Apr 3 2026, 12:48 PM

Hi,

Thanks for reaching out. Roughly speaking, we start to throttle connections (for bots that respect maxlag) when the change propagation lag between wikidata.org and the WDQS secondary store is higher than 10 minute for a sustained amount of time. A typical reason for this lag increase is when WDQS is put under heavy read load, and writes can't keep up.

WDQS has been under heavy load starting Thursday, April 2. This had a cascading effect on lag and bot throttling (https://grafana.wikimedia.org/goto/dfhtdl5py8qv4f?orgId=1). We have alerting and operational processes in place to mitigate this issue. We've been tracking load since early alerts started to fire on Thursday. Unfortunately, we had several actors concurrently putting WDQS under strain for the past 5 days, that defied automated remediation we have in place and required manual intervention. This situation resulted in bot throttling kicking off more than we would have liked. We've seen spikes in load and lag also this morning CEST (https://grafana.wikimedia.org/goto/dfid0zv54dqm8e?orgId=1), as we keep monitoring the situation.

FWIW, if the maxlag is consistently high but some bots are still editing so fast that are keeping wdqs under pressure, it is a clear violation of bot policy and should be blocked.

The top editor yesterday and the day before was Mahir256 with 40K edits each day. The day before that was @Epidosis with 203K edits(!), the day before was Epìdosis again with 202K edits. I think that's causing issues. Epìdosis: Please respect maxlag.

@Ladsgroup both Epìdosis and I were using QuickStatements (he version 3.0 and I version 2.0); your complaint about tools not respecting maxlag should be directed at @Arcstur in the former case and @Magnus in the latter case.

Hi, this may be related to my import of data from GND into Wikidata via QS 3.0 which ran from April 1 to April 5 (https://w.wiki/KdP6). I thought QS 3.0 respected automatically maxlag; but maybe there is some kind of exception to be removed for users with admin flag. I have just reported at https://meta.wikimedia.org/wiki/Talk:QuickStatements_3.0#Maxlag_issues

For the record: The problems began on March 25th or 26th (see Grafana control panel), and it is still an issue currently because the minimum maxlag is 9 seconds for the last 1 hour. This blocks all those bots respecting the Bot Policy (like Pywikibot default settings).

The problems began on March 25th:

image.png (692×1 px, 73 KB)

Please (re)attach the file, so that it's visible if it's important (Phabricator/Help § File visibility).

Please (re)attach the file, so that it's visible if it's important (Phabricator/Help § File visibility).

done

The problems began on March 25th:

image.png (692×1 px, 73 KB)

Exact timestamp seems to be shortly after 2026-03-25 15:00 UTC, which apparently is pretty much exactly the moment when the northward datacenter switchover (March 2026 codfw to eqiad) took place T413974. Can anyone please check whether there is causality?

Thanks. I asked around to see if anyone would be willing to take a look.

Any update here?

The problem persists, my bots are regularly crashing due to persistent maxlag timeouts, and community members complain that my bots need a fix (when they simply obey with the maxlag policy).

As a note: I have started a new import through QS 3.0 a few hours ago - cf. https://www.wikidata.org/wiki/Property_talk:P227#Massive_import_of_data_from_GND_(May_2026) - but since the tool respects maxlag, it should not have a relevant impact.

@Epidosis Hi, what makes you say that QS 3.0 respects maxlag? I checked the source code and see no reference to maxlag.

Maxlag has been abnormally high for the past few days, and has exceeded five minutes at time of writing! Most of my queries are failing, important bots aren't working…

...tools not respecting maxlag should be directed at @Arcstur in the former case...

Pinging @ACorrea-WMB just in case you no longer check your personal Phabricator account.

The issue has got progressively worse over the past several hours, and max lag is now at around 40 minutes:

Max Lag.png (500×1 px, 28 KB)

I can't say with certainty that this is due to QS 3.0, though I do note that the majority of edits happening right now are through QS 3.0.

I wonder if someone did something; max lag seems to have returned to normal levels since around 10:00 UTC.

I wonder if someone did something; max lag seems to have returned to normal levels since around 10:00 UTC.

I fear it's increasing again.

Hello, everyone, I'll share here some info regarding QS3 so you can help me understand if we are or not respecting it... I'll split it into parts.

  1. QS3 does not deal with maxlag directly because it does not use the Action API, only the Wikibase REST API.
  1. QS3 will sometimes hit 429 in the REST API, when that happens, it starts an exponential backoff wait using the python urllib library. We are frequently hitting 90 edits per minute, which is the maximum allowed for the users. We are not editing more than that because that is not possible. The edits per minute can be checked in EditGroups for some batches.
  1. In January a lot of users came to me warning that batches were stalling for hours. They were regularly being stalled for 1h+. After debugging I saw that the "Retry-After" header sent with the 429 had enormous delay times, up to 6 hours. So I modified QS3 to start ignoring that and use just the timeout with exponential backoff: https://github.com/wikimediabrasil/quickstatements3/issues/409

Without a good way to measure if maxlag is being respected or not is hard to tell. Maxlag is still a rather confusing term for me, since I'm most used to dealing with bare HTTP requests.

About (3), I can switch back for QS3 to use the Retry-After header received with 429's, but if that makes batches stall for 1h+, which is unreasonble since every user has 90 edits per minute per default.

I updated what I described above on Jan 24, seeing as the graph here shows the increase after 2026-03-25 15:00 UTC I wonder if they are really connected.

Maxlag dropped from 51 minutes (!) to a normal ca. 5 seconds at 10 UTC today, but already shortly after 11 UTC it restarted a significant growth and is now (17:30 UTC) slightly above 8 minutes. Considering the light editing today on QS 3.0, I guess we can say QS 3.0 is probably unrelated with the issue we are having.