Page MenuHomePhabricator

[Migrated] Number of Edits > 15,000 shows 20,000 instead
Open, Needs TriagePublic

Description

@MaxBioHazard 17:20, 24 February 2015 (UTC) wrote:

If one tries to get a list of last edits of a certain user using method "User contribs (user defined number)", and the parameter 'number of edits' value is more than 15,000, the output shows 20,000 edits. It has been examined trying 15,000 (OK); 15,100 (shows 20,000); 15,500 (same), 15,900 (same), 16,000 (same).


Site ruwiki
OS Win 8.1

Event Timeline

Josve05a created this task.May 16 2015, 1:34 AM
Josve05a raised the priority of this task from to Needs Triage.
Josve05a updated the task description. (Show Details)
Josve05a added subscribers: Josve05a, MBH.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 16 2015, 1:34 AM
Josve05a added a subscriber: Reedy.May 16 2015, 1:36 AM

@Reedy 23:48, 25 February 2015 (UTC) wrote:

You'll see the same behaviour with 10,100 etc too, you'll get it rounded up to the nearest 5k (if logged in and you have the apihighlimits right). If logged out, you'll get 10,500. This is essentially expected behaviour, documented in the MW API here. It is a bug, but it's not a major one. We can probably handle it better, we know which queries are "expensive" and which that aren't. So we could be smarter with the limit values we pass to know how many results we have, knowing how many we'll get on each request etc. I'll probably work on it eventually, but this should probably be moved to phabricator for longer term.

@MaxBioHazard 20:53, 26 February 2015 (UTC) wrote:

But API already now allows you to request any number of edits, not only maximum. It's bug, because if requested and received values differ to 4000, list must be long and time consuming cleaned manually.

@Reedy 17:38, 27 February 2015 (UTC) wrote:

I didn't say it wasn't a bug, it's just not a major one. But you are confused. If you make an API request with a limit number greater than "max" (which is shorthand to just give you the maximum amount of results you're "allowed" in one request), you will just be returned the value of max:

	const LIMIT_BIG1 = 500; // Fast query, std user limit
	const LIMIT_BIG2 = 5000; // Fast query, bot/sysop limit
	const LIMIT_SML1 = 50; // Slow query, std user limit
	const LIMIT_SML2 = 500; // Slow query, bot/sysop limit

To get more than these amounts (500/5000 as user contribs is a "fast" query), you have to do multiple requests. See this example requesting 25000 results for me. Link

<warnings>
  <usercontribs xml:space="preserve">uclimit may not be over 5000 (set to 25000) for bots or sysops</usercontribs>
</warnings>

You'll get 5000 results. Not the number you provided. To then get more results, you do a request passing in the correct continuation information. So, in the case you're requesting 10,100 results. You'll do 2 requests that give you 5000 results each. Because 2x5000 = 10000 < 10,1000, it will then do a third request for "max" results (another 5000). Which gives you the 15,000 results. If you're not a bot/sysop or have the apihighlimits rights via whatever other method, you'll get 10,500.

Like I say, it is a bug (I don't dispute that), but it's not a major one. It's going to be full of dupes anyway (most people edit the same page numerous times), so you probably should filter for dupes to begin with anyway.

@MaxBioHazard 19:04, 1 March 2015 (UTC) wrote:

I know all you say me. In sentence API now allows you to request any number of edits, not only maximum I mean that this number can be LESS than maximum. Request /w/api.php?action=query&list=usercontribs&format=xml&uclimit=123&ucuser=MaxBioHazard will give 123 last edits, not 500. So, AWB should divide requested number of edits to expression A*500(0)+B (B < 500(0)) and send last request with parameter uclimit=B.