Page MenuHomePhabricator

API claims query is over 12mb but it is not
Open, MediumPublic

Description

What did I do?

Did python pwb.py replace -lang:commons -family:commons -regex "ISBN\s+((97(8|9))?\s?-?([0-9]\s?-?){9}([0-9Xx]))([\D$])" "{{ISBN|\1}}\6" -summary:"[[User:Revibot/Task/4|bot]]) (Replace ISBN magic links" -always -cat:"Pages using ISBN magic links" -ns:14

Actual behavior:

WARNING: API warning (result): This result was truncated because it would otherwise be larger than the limit of 12,582,912 bytes.

Expected behavior

Page edited.

image.png (1×2 px, 429 KB)

First item in the category in question is this. I edited manually, and it worked. Actually, that page was 910bytes.

NOTE: I don't know which is at fault here: replace.py or API itself, so I'm filling under both as projects. Please adjust as needed.

Event Timeline

revi added a subscriber: zhuyifei1999.

The warning here does not say the page is so large, but the search query (in your case -cat:"Pages using ISBN magic links") returns too much results (the results of the search query are too large) and the search query took too long so the API did not output you the full list of pages, but trimmed list instead

Dvorapa renamed this task from API claims page is over 12mb but it is not to API claims query is over 12mb but it is not.May 30 2018, 8:15 PM

I cannot explain the error, but I think no Category pages will be yielded (which would be a separate bug)

In:

Body: 'gcmtitle=Category%3APages+using+ISBN+magic+links&gcmprop=ids%7Ctitle%7Csortkey&gcmtype=page%7Cfile&prop=info%7Cimageinfo%7Ccategoryinfo&inprop=protection&iiprop=timestamp%7Cuser%7Ccomment%7Curl%7Csize%7Csha1%7Cmetadata&iilimit=max&generator=categorymembers&action=query&indexpageids=&continue=gcmcontinue%7C%7Cuserinfo&gcmlimit=500&meta=userinfo&uiprop=blockinfo%7Chasmsg&maxlag=5&format=json&gcmcontinue=file%7....35'
gcmtype=page%7Cfile

which means that only files or pages will be yielded, no subcats.

In categorymembers

cmtype

    Which type of category members to include. Ignored when cmsort=timestamp is set. 
    Values (separate with | or alternative): page, subcat, file
    Default: page|subcat|file

Try a small one: https://commons.wikimedia.org/wiki/Category:Arthrocereus
This category has the following 5 subcategories, out of 5 total.
This category contains only the following file.

user@pc:~/python/core {master}$ python scripts/listpages.py -lang:commons -family:commons -cat:"Arthrocereus"
   1 Arthrocereus HU 330.jpg
1 page(s) found
user@pc:~/python/core {master}$ python scripts/listpages.py -lang:commons -family:commons -cat:"Arthrocereus" -ns:14
0 page(s) found

Try a small one: https://commons.wikimedia.org/wiki/Category:Arthrocereus
This category has the following 5 subcategories, out of 5 total.
This category contains only the following file.

user@pc:~/python/core {master}$ python scripts/listpages.py -lang:commons -family:commons -cat:"Arthrocereus"
   1 Arthrocereus HU 330.jpg
1 page(s) found
user@pc:~/python/core {master}$ python scripts/listpages.py -lang:commons -family:commons -cat:"Arthrocereus" -ns:14
0 page(s) found

-subcats is the correct option to be used in such case.

I don't really care about the subcategories. I need the data in the main category (the ISBN giant) to edit the subcategory's description page and nothing in the subcategory.

Exactly, then you need to use -subcats: option to get the Category pages inside the main category.

python scripts/listpages.py -lang:commons -family:commons -subcats:"Arthrocereus" -format:5
   1 Category:Arthrocereus by country        
   2 Category:Arthrocereus glaziovii         
   3 Category:Arthrocereus melanurus         
   4 Category:Arthrocereus rondonianus       
   5 Category:Arthrocereus spinosissimus     
5 page(s) found
Vvjjkkii renamed this task from API claims query is over 12mb but it is not to 3ybaaaaaaa.Jul 1 2018, 1:07 AM
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
Mpaa renamed this task from 3ybaaaaaaa to API claims query is over 12mb but it is not.Jul 1 2018, 6:59 PM
Mpaa lowered the priority of this task from High to Medium.
Mpaa updated the task description. (Show Details)
Mpaa added a subscriber: Aklapper.

See T253591. Please retry with decreasing the query increment. By default the maximum query increment of data is retrieved which depends on the membership of user groups. You may decrease the maximum query increment by the step parameter in your user-config.py; this parameter is also available as global -step option if your script calls pywikibot.handle_args() for all command line options refer the basic.py sample script for that.