Page MenuHomePhabricator

Maxlag=5 for Petscan
Open, Needs TriagePublic

Description

Petscan edits go through even if maxlag is higher than 5, we should fix the code to ensure it complies with the maxlag policy on Wikimedia wikis by using maxlag=5 (or a lower value) on all editing API queries.

Event Timeline

Pintoch assigned this task to Magnus.Dec 10 2019, 7:53 PM
Pintoch created this task.

Note PetScan query maxlag first and sleep for some seconds if the maxlag is high. The edits are nevertheless done regardless the maxlag. This was meaningful before dispatch and query service lag is added to maxlag (see T221774#5692489), but it should now be reconsidered.

I've had a quick look at the code to see if I could submit a patch for this myself but it is not clear to me where the edits are done - I have looked in petscan_rs and wikibase_rs to no avail. Petscan edits might be done in the browser by sending them to some Widar-like interface?

@Bugreporter have you got details of where this behaviour is currently implemented in PetScan? In particular, how do you request the current maxlag with the MediaWiki API?

Addshore moved this task from incoming to in progress on the Wikidata board.Dec 11 2019, 11:41 AM

PetScan do edits via Widar: https://github.com/magnusmanske/petscan_rs/blob/master/html/autolist.js

Widar itself does not use maxlag at all as of 2015: https://bitbucket.org/magnusmanske/magnustools/src/2f3811e80d0b215f9b45fb9b6da2d57536c56077/public_html/php/oauth.php?at=master&fileviewer=file-view-default

In 2017-08 (see https://www.wikidata.org/wiki/Wikidata:Administrators%27_noticeboard/Archive/2017/08#Maxlag_parameter_not_respected) the maxlag throttle is added.

	function sleepAfterEdit ( $type ) {
		if ( $this->auto_detect_lag ) { // Try to auto-detect lag
			$url = $this->apiUrl . '?action=query&meta=siteinfo&format=json&maxlag=-1' ;
			$t = @file_get_contents ( $url ) ;
			if ( $t !== false ) {
				$j = @json_decode ( $t ) ;
				if ( isset($j->error->lag) ) {
					$lag = $j->error->lag ;
					if ( $lag > 1 ) sleep ( $lag * 3 ) ;
					return ;
				}
			}
		}

		if ( $type == 'create' ) sleep ( $this->delay_after_create_s ) ;
		if ( $type == 'edit' ) sleep ( $this->delay_after_edit_s ) ;
		if ( $type == 'upload' ) sleep ( $this->delay_after_upload_s ) ;
	}

So if the lag is n seconds, it will sleep 3n+1 seconds for edits or 3n+2 seconds for page creations. After it edit will be done without maxlag parameter. Note if you run PetScan in a bot accounts, the request may be run five in parallel by default, therefore if the lag is 10 seconds, PetScan will make 5 edits every 31 seconds.

Note make edits fail when maxlag is high should be considered a breaking change. Many tools (including PetScan) assumes edits never fail and never retries edit.
Widar is used in various tools (Reasonator, Mix'n'match, Tabernacle) that only performs user-invoked edits. It does not make sense to make edits fail for such edits. Sleeping until maxlag is lower does not seem a solution either, as this will either results in timeout or too many open connections.

Thanks for the analysis! Whether this is a breaking change or not is not my concern: Petscan and other mass-editing tools based on Widar should play by the book. I can provide a simple patch which ensures maxlag=5 is applied to all Widar edits: if someone wants to do a refined version which allows specific user-triggered edits to go through without a maxlag parameter, that is great. @Magnus, what is your take on this?