Page MenuHomePhabricator

Have a "now is not a good time" flag on the Wikidata api
Open, Needs TriagePublic

Description

When we run bots we sometime run into api requests to backup for a few seconds. Our bots respect that request and will pause and resume as the suggested seconds. However, often the issue persists which leads to many repeated attempts. See for example snippet from a recent run:

2020-02-11 22:29:03.462709: maxlag. sleeping for 5.466666666666667 seconds
2020-02-11 22:29:09.111319: maxlag. sleeping for 5.466666666666667 seconds
2020-02-11 22:29:14.735034: maxlag. sleeping for 5.466666666666667 seconds
2020-02-11 22:29:20.354678: maxlag. sleeping for 5.466666666666667 seconds
2020-02-11 22:29:25.998749: maxlag. sleeping for 5.466666666666667 seconds
2020-02-11 22:29:31.629414: maxlag. sleeping for 5.466666666666667 seconds
2020-02-11 22:29:37.255963: maxlag. sleeping for 5.466666666666667 seconds
2020-02-11 22:29:42.869407: maxlag. sleeping for 5.466666666666667 seconds
2020-02-11 22:29:48.503712: maxlag. sleeping for 5.466666666666667 seconds
2020-02-11 22:29:54.129682: maxlag. sleeping for 5.466666666666667 seconds
2020-02-11 22:29:59.730619: maxlag. sleeping for 5.466666666666667 seconds
2020-02-11 22:30:05.355369: maxlag. sleeping for 5.816666666666666 seconds
2020-02-11 22:30:11.371665: maxlag. sleeping for 5.816666666666666 seconds
2020-02-11 22:30:17.402557: maxlag. sleeping for 5.816666666666666 seconds
2020-02-11 22:30:23.387781: maxlag. sleeping for 5.816666666666666 seconds
2020-02-11 22:30:29.375381: maxlag. sleeping for 5.816666666666666 seconds
2020-02-11 22:30:35.377794: maxlag. sleeping for 5.816666666666666 seconds
2020-02-11 22:30:41.355254: maxlag. sleeping for 5.816666666666666 seconds
2020-02-11 22:30:47.319589: maxlag. sleeping for 5.816666666666666 seconds
2020-02-11 22:30:53.293700: maxlag. sleeping for 5.816666666666666 seconds
2020-02-11 22:30:59.300540: maxlag. sleeping for 5.816666666666666 seconds
2020-02-11 22:31:05.272313: maxlag. sleeping for 6.233333333333333 seconds
2020-02-11 22:31:11.697159: maxlag. sleeping for 6.233333333333333 seconds
2020-02-11 22:31:18.098883: maxlag. sleeping for 6.233333333333333 seconds
2020-02-11 22:31:24.493465: maxlag. sleeping for 6.233333333333333 seconds

We have currently have a limit of 25 attempts afterwhich an exception is sent and the bot is terminated.

I am a bit surprised that the request to backoff does not become more aggressive with incremental steps, i.e. staying around 5-6 seconds.

However the termination based on this approach disrupts the overall workflow, so we are looking into different approaches.
One would be to increase the value in the number of max attempts from 25 to 100 or even 1000.

Or to exponentailly increase the waiting time per each iteration.

But I am wondering if the API benefits from these repetative attempts to connect to the API.

Another approach would be to do not repeat, but to have some "Now is not a good time", which our bots could consult before commensing with a bot run. This way, we do not send unneccesary attempts.

Is this possible?

Event Timeline

I once suggested you should at least retry 1000 times (https://www.wikidata.org/wiki/User_talk:Andrawaag#WikidataIntegrator_and_maxlag), but Multichill suggests T221774#5679907. It seems this is still not a good idea (see my analysis at T245144#5880941).

Pywikibot retry edits infinitely on maxlag.

Yes I remember the 1000 suggestion, but we can certainly try it perpetually, but somehow that does not feel right. Every unsuccesful attempt is yet another attempt bothering the api. Would it not be better to simply stop until the api settle down after 25 efforts?