Increase Retry-After header for Wikidata
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	Urbanecm
	Feb 13 2020, 12:33 PM

Description

Hello,

in T243701, we discuss an issue of maxlag higher than 5s for Wikidata. The issue is caused by bots editing too frequently, WDQS lags, bots stop due to maxlag parameter forcing them to, WDQS recovers, lag decreases and so on.

Pywikibot respects the value set in Retry-After header, see Pywikibot's code (1, 2).

Increasing this value, at least for Wikidata (we'd need a new hook for that probably), could make bots delay with editing for longer time, giving more time to WDQS to recover.

Opinions?

Related Objects

Mentioned In: T243701: Wikidata maxlag repeatedly over 5s since Jan 20, 2020 (primarily caused by the query service)
T245138: Have a "now is not a good time" flag on the Wikidata api
Mentioned Here: T240442: Design a continuous throttling policy for Wikidata bots
T221774: Add Wikidata query service lag to Wikidata maxlag
T245138: Have a "now is not a good time" flag on the Wikidata api
T243701: Wikidata maxlag repeatedly over 5s since Jan 20, 2020 (primarily caused by the query service)

Event Timeline

Urbanecm created this task.Feb 13 2020, 12:33 PM

Restricted Application added a project: Wikidata. · View Herald TranscriptFeb 13 2020, 12:33 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

See also: T221774#5679907

In T221774#5679907, @Multichill wrote:

In T221774#5679442, @Bugreporter wrote:

Currently multiple tools are broken because the time that make maxlag back to normal is much longer than the total time the tool retries edits (see this and this).

Good that this change works as intended? Tools should be updated to handle maxlag in a graceful manner. Retry time should not be something fixed, but something which increases if you get it or not. So 5, 10, 20, 40, 80, etc. This also makes sure that not all tools restart at the same time.

Though I don't think this will be really effective. For example we have a period higher than maxlag for 5 minutes, i.e. 300 seconds (remember query service lag is updated each minute), then all bot will sleep 5+10+20+40+80+160=315s, and then restart at the same time (if they do one edit every 10 second, they will restart in a 10 second timespan).

Bugreporter mentioned this in T245138: Have a "now is not a good time" flag on the Wikidata api.Feb 13 2020, 2:29 PM

I have been thinking about this. Originally I thought it would help but the more I think I feel it would be very similar to the increasing factor. The oscillation will be longer but the situation stays the same because they only respect after WDQS lagging behind too much due to speed. We need something to slow down the bots before they reach to the threshold of maxlag and would have to back off.

This will be T240442: Design a continuous throttling policy for Wikidata bots.

This is one of those bugs where you should just lookup the relevant chapter in a book like http://barbie.uta.edu/~jli/Resources/MapReduce&Hadoop/Distributed%20Systems%20Principles%20and%20Paradigms.pdf and look at the possible solutions.

Urbanecm renamed this task from Increase Retry-Time header for Wikidata to Increase Retry-After header for Wikidata.Feb 14 2020, 9:33 PM

In T245144#5884781, @Ladsgroup wrote:

I have been thinking about this. Originally I thought it would help but the more I think I feel it would be very similar to the increasing factor. The oscillation will be longer but the situation stays the same because they only respect after WDQS lagging behind too much due to speed. We need something to slow down the bots before they reach to the threshold of maxlag and would have to back off.

Maybe I'm doing some mistake, but I believe that increasing the factor behaves differently than Retry-After change. Feel free to correct me if I'm wrong. When you increase the factor, lag decreases, althrough the situation is the same (real lag is the same, nothing changed) - as a result, bot edits for a longer period of time, the lag is too high for them after a longer period of time. It also should increase the backoff period, but maybe because bots can make more edits, the WDQS has more work, and as a result, the issue is bigger than before (=longer time of too big lag).

On the other hand, when I imagine a Retry-After change, the bots (complying with https://www.mediawiki.org/wiki/Manual:Maxlag_parameter, includes PWB) sleep for longer time, giving WDQS more time to recover. The number of new edits saved should be the same, given lag re-increases with the same speed. Not sure about other bots, but PWB seems to sleep for (at least) the recommended number of seconds, and then tries again. If the recommended number of seconds were higher, the bots should just edit slower, IMO.

In T245144#5885952, @Multichill wrote:

This is one of those bugs where you should just lookup the relevant chapter in a book like http://barbie.uta.edu/~jli/Resources/MapReduce&Hadoop/Distributed%20Systems%20Principles%20and%20Paradigms.pdf and look at the possible solutions.

If you have something specific in your mind, please do feel free to share it!

Dvorapa subscribed.Feb 14 2020, 9:53 PM

In T245144#5884781, @Ladsgroup wrote:

We need something to slow down the bots before they reach to the threshold of maxlag and would have to back off.

Totally agree!

In T245144#5885952, @Multichill wrote:

This is one of those bugs where you should just lookup the relevant chapter in a book like http://barbie.uta.edu/~jli/Resources/MapReduce&Hadoop/Distributed%20Systems%20Principles%20and%20Paradigms.pdf and look at the possible solutions.

I looked at TOC of that book and couldn't find a topic related to this. I also have another book physically and couldn't find anything in it either. There are some good information in this book that I shared it in T240442: Design a continuous throttling policy for Wikidata bots

In T245144#5886173, @Urbanecm wrote:

On the other hand, when I imagine a Retry-After change, the bots (complying with https://www.mediawiki.org/wiki/Manual:Maxlag_parameter, includes PWB) sleep for longer time, giving WDQS more time to recover. The number of new edits saved should be the same, given lag re-increases with the same speed. Not sure about other bots, but PWB seems to sleep for (at least) the recommended number of seconds, and then tries again. If the recommended number of seconds were higher, the bots should just edit slower, IMO.

How pwb works with throttleling:

the http response header value retry_after is responsible for the delay after a maxlag has been triggered
The retry_after value was alway 5 s the last years and that value does not seems sufficient. Therefore the current maxlag value was also taken into account for the wait cycle by 1/5 for the first try, 2/5 for the second try, 4/5 for the third, 8/5 for the forth 16/5 for the fifth etc. but never less than retry_after value.
there is a put_throttle for every api write access which 10 s by default but should never be below 5 by the most local bot policies
there is a minthrottle for every api read access which is 0 by default i.e. there is no read throttleling at all.
if more than one bot is working simultaneously the times are lengthened

I guess the minthrottle should be activated for read access on wikidata too to avoid server overload

Note as I said in T243701#5884926, some bots runs with put_throttle=1 or 0.

In T245144#5955788, @Bugreporter wrote:

Note as I said in T243701#5884926, some bots runs with put_throttle=1 or 0.

They should not do it continuously, this makes Pywikibot ignore any maxlag or throttle values and just rush-run edits. But of course, sometimes bots have to use putthrottle=0 (or close to 0) to fix some breakage in Wikipedia articles/Wikidata items quickly. Therefore restricting all bots that ever used putthrottle=0 (or close to 0) is unreasonable, but monitoring bots activity and restricting those who do it continuously or regularly is necessary and should be somehow carried out.

Even if bots are using pt:0, they will still follow maxlag (unless also set maxlag=0). This may cause some problem (start and stop brutally and make the lag oscillating), though will not break the server. Even if you use the default put throttle (10, or even longer), the issue may still occur if there're so many bots running that result in an edit rate Query Service Updater can not handle.

start and stop brutally and make the lag oscillating

Yes, but in my opinion this behavior is not much polite and ethical to servers too. And to me it seems it is also dishonest to other bots in an indirect way: Bots rush-saving edits in a rate like 1 per second or worse make Query Service Updater lag much sooner/worse than bots respecting 5+ s putthrottle.

Dvorapa mentioned this in T243701: Wikidata maxlag repeatedly over 5s since Jan 20, 2020 (primarily caused by the query service).Mar 16 2020, 6:22 PM

• Manuel moved this task from Incoming to Miscellaneous on the Wikidata-Campsite board.Jul 12 2022, 12:04 PM

Increase Retry-After header for WikidataOpen, Needs TriagePublicActions

Description

Related Objects

Event Timeline

Increase Retry-After header for Wikidata
Open, Needs TriagePublic
Actions