Apr 9 2020
Thanks a million, this is very kind of you!
Switching on headers capitalization would be absolutely fantastic.
Apr 8 2020
Sure, it's your call. I am sure you have more important things to do than this sort of hack.
Thank you very much for the investigation.
This seems related to T249526.
The fix for this issue might also be the cause for T249680, Wikidata-Toolkit not being able to log in anymore.
So, if we can track down which MediaWiki commit started to introduce "set-cookie" headers (instead of "Set-Cookie"), we could hopefully submit a patch to capitalize it.
Apr 6 2020
The feedback was submitted, let's consider this closes the task.
Mar 28 2020
Mar 12 2020
The service is down at the moment because of T247501. We should consider hosting the service outside Toollabs, as we have had a range of similar issues in the past (unavailability due to devops bugs outside our control).
Feb 16 2020
Concerning the choice of language to make it easier to maintain / deploy in a Wikimedia context:
- PHP seems like a pretty widespread choice, and is mandatory if the API is to be implemented as a MediaWiki extension . My understanding from our meeting with @Lokal_Profil is that it would generally be helpful to other Wikimedia organizations who are familiar with this stack, even if the service is not directly integrated in MediaWiki;
- @Mvolz told me that Node.js is also used to run services at WMF (but Parsoid is moving from Node.js to PHP-only, perhaps a sign that Node.js is not a good long-term choice).
Feb 13 2020
Feb 11 2020
Jan 23 2020
Have you tried with a more recent version (such as 5.0 and above)? 3.3.11 is quite old and might not support INCRBYFLOAT, which is used to compute usage statistics for the endpoint.
Jan 18 2020
This does not have anything to do with you indeed! I was just trying to explain that I stopped trying to help solve this issue (therefore unsubscribing from this).
Jan 17 2020
It is actually possible to retrieve the current maxlag value from the API without making any edit (see @Addshore's comment above).
So, just retrieve the current maxlag value and compute your desired edit rate for this maxlag with the function plotted above. Then sleep for the appropriate amount of time between any two edits to achieve this rate. Refresh the maxlag value from the server periodically.
Jan 3 2020
The task has been resolved.
Dec 29 2019
OK! This connection should never be idle given that bot edits on Wikidata never stop, so I am still not sure why this happens. It might be due to the specifics of how Django handles these long-running SQL connections.
Dec 24 2019
Dec 18 2019
Dec 15 2019
Dec 14 2019
This seems like a pretty important bug… I would not get held up by design worries: just make sure no exception is thrown!
Dec 11 2019
Thanks! I think dynamically changing the maxlag value is likely to still introduce some thresholds, whereas a continuous slowdown (by retrieving the lag and compute one's edit rate based on it) should in theory reach an equilibrium point.
Thanks for the analysis! Whether this is a breaking change or not is not my concern: Petscan and other mass-editing tools based on Widar should play by the book. I can provide a simple patch which ensures maxlag=5 is applied to all Widar edits: if someone wants to do a refined version which allows specific user-triggered edits to go through without a maxlag parameter, that is great. @Magnus, what is your take on this?
If clients are able to retrieve the current lag periodically (through some MediaWiki API call? which one?), then this should not require any server-side change. Clients can continue to use maxlag=5 but to also throttle themselves using the smoothed function proposed.
@Bugreporter have you got details of where this behaviour is currently implemented in PetScan? In particular, how do you request the current maxlag with the MediaWiki API?
Dec 10 2019
I've had a quick look at the code to see if I could submit a patch for this myself but it is not clear to me where the edits are done - I have looked in petscan_rs and wikibase_rs to no avail. Petscan edits might be done in the browser by sending them to some Widar-like interface?
@Bugreporter yes indeed! I was off by one hour there. Thanks for your help! Feel free to add more bots which match that period.
Matching pull request: https://github.com/arthurpsmith/author-disambiguator/pull/107
If you have not changed anything in user-config.py then you should be good to go, it might have been a false positive on my side. Sorry for the noise!
I am first getting in touch with people who seem to be running bots with maxlag greater than 5 or no maxlag parameter at all, to see if they would accept to follow @Addshore's advice never to use maxlag greater than 5 at all.
Dec 2 2019
There has been one attempt I think but it did not go very far - and the ids do not seem to have been imported so far!
Nov 29 2019
I think it is a bit harder to extract more than 1000 records (I didn't cap it on purpose to make it manageable for the workshop).
@Jheald I don't think anyone is working on this anymore: if you are still interested in the scraped dataset, it is here: http://pintoch.ulminfo.fr/adc2c9aaba/lakes-portal.tsv
Nov 26 2019
OK! If you have ways to check what sort of maxlag values are used it would be great!
Nov 25 2019
Actually, some tools seem to be doing something like that already, since edits are still going through despite max lag being above 5 for more than an hour now (Author Disambiguator does this, QuickStatements too probably, Edoderoobot too). So these tools use higher (more agressive) maxlag values than 5.
One problem with the current policy (requesting all automated editing processes to use maxlag=5) is that this creates a binary threshold: either the query service lag is under the threshold, in which case bots will edit at full speed, or the query service lag is above the threshold, in which case they should all stop editing entirely. This is likely to create an oscillating behaviour, where all bots start and stop periodically. This is probably not ideal neither for the infrastructure nor for the users.
Nov 23 2019
We have an import here: https://tools.wmflabs.org/editgroups/b/OR/8cf42ae3c0/
Moving to "Done" although this is not complete, but we did have a lot of new translations (not just in Dutch) during the event.
Data is getting into Wikidata!
Nov 22 2019
If you have trouble reconciling, use this reconciliation service: http://ulminfo.fr:3894/en/api
For people who cannot install OpenRefine on their laptops, use: http://188.8.131.52:48379/
Nov 21 2019
So this is what we get with an exponential back-off (1.5 factor), at the moment:
22:37:27.148 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 5.4916666666667 seconds lagged. -- pausing for 1000 milliseconds. (19338ms) 22:37:28.729 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 5.4916666666667 seconds lagged. -- pausing for 1500 milliseconds. (1581ms) 22:37:33.809 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 5.4916666666667 seconds lagged. -- pausing for 2250 milliseconds. (5080ms) 22:37:37.931 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 5.4916666666667 seconds lagged. -- pausing for 3375 milliseconds. (4122ms) 22:37:42.663 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 5.4916666666667 seconds lagged. -- pausing for 5062 milliseconds. (4732ms) 22:37:49.437 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 5.4916666666667 seconds lagged. -- pausing for 7593 milliseconds. (6774ms) 22:37:58.429 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 5.4916666666667 seconds lagged. -- pausing for 11389 milliseconds. (8992ms) 22:38:18.217 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 6 seconds lagged. -- pausing for 17083 milliseconds. (19788ms) 22:38:36.461 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 6 seconds lagged. -- pausing for 25624 milliseconds. (18244ms) 22:39:05.013 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 6.4666666666667 seconds lagged. -- pausing for 38436 milliseconds. (28552ms)
So it looks like this means no OpenRefine edits at all with these new rules, in the current situation.
Nov 20 2019
Thanks for the notification! I would be happy to release a new version of OpenRefine with a patch applied - I can do this in the coming days. The exponential back-off suggested by @Multichill makes sense intuitively - could WMDE confirm that this is the policy they recommend? Happy to adapt the policy as required.