Page MenuHomePhabricator

API claims query is over 12mb but it is not
Open, MediumPublic

Description

What did I do?

Did python pwb.py replace -lang:commons -family:commons -regex "ISBN\s+((97(8|9))?\s?-?([0-9]\s?-?){9}([0-9Xx]))([\D$])" "{{ISBN|\1}}\6" -summary:"[[User:Revibot/Task/4|bot]]) (Replace ISBN magic links" -always -cat:"Pages using ISBN magic links" -ns:14

Actual behavior:

WARNING: API warning (result): This result was truncated because it would otherwise be larger than the limit of 12,582,912 bytes.

Expected behavior

Page edited.

First item in the category in question is this. I edited manually, and it worked. Actually, that page was 910bytes.

NOTE: I don't know which is at fault here: replace.py or API itself, so I'm filling under both as projects. Please adjust as needed.

Event Timeline

revi created this task.May 30 2018, 6:00 PM
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptMay 30 2018, 6:00 PM
revi moved this task from Incoming to Radar on the User-revi board.May 30 2018, 6:35 PM
revi added a subscriber: zhuyifei1999.

The warning here does not say the page is so large, but the search query (in your case -cat:"Pages using ISBN magic links") returns too much results (the results of the search query are too large) and the search query took too long so the API did not output you the full list of pages, but trimmed list instead

Dvorapa renamed this task from API claims page is over 12mb but it is not to API claims query is over 12mb but it is not.May 30 2018, 8:15 PM
Mpaa added a subscriber: Mpaa.May 30 2018, 8:18 PM

I cannot explain the error, but I think no Category pages will be yielded (which would be a separate bug)

In:

Body: 'gcmtitle=Category%3APages+using+ISBN+magic+links&gcmprop=ids%7Ctitle%7Csortkey&gcmtype=page%7Cfile&prop=info%7Cimageinfo%7Ccategoryinfo&inprop=protection&iiprop=timestamp%7Cuser%7Ccomment%7Curl%7Csize%7Csha1%7Cmetadata&iilimit=max&generator=categorymembers&action=query&indexpageids=&continue=gcmcontinue%7C%7Cuserinfo&gcmlimit=500&meta=userinfo&uiprop=blockinfo%7Chasmsg&maxlag=5&format=json&gcmcontinue=file%7....35'
gcmtype=page%7Cfile

which means that only files or pages will be yielded, no subcats.

In categorymembers

cmtype

    Which type of category members to include. Ignored when cmsort=timestamp is set. 
    Values (separate with | or alternative): page, subcat, file
    Default: page|subcat|file
Mpaa added a comment.May 30 2018, 8:27 PM

Try a small one: https://commons.wikimedia.org/wiki/Category:Arthrocereus
This category has the following 5 subcategories, out of 5 total.
This category contains only the following file.

user@pc:~/python/core {master}$ python scripts/listpages.py -lang:commons -family:commons -cat:"Arthrocereus"
   1 Arthrocereus HU 330.jpg
1 page(s) found
user@pc:~/python/core {master}$ python scripts/listpages.py -lang:commons -family:commons -cat:"Arthrocereus" -ns:14
0 page(s) found
Mpaa added a comment.EditedMay 31 2018, 6:00 PM

Try a small one: https://commons.wikimedia.org/wiki/Category:Arthrocereus
This category has the following 5 subcategories, out of 5 total.
This category contains only the following file.

user@pc:~/python/core {master}$ python scripts/listpages.py -lang:commons -family:commons -cat:"Arthrocereus"
   1 Arthrocereus HU 330.jpg
1 page(s) found
user@pc:~/python/core {master}$ python scripts/listpages.py -lang:commons -family:commons -cat:"Arthrocereus" -ns:14
0 page(s) found

-subcats is the correct option to be used in such case.

revi added a comment.May 31 2018, 6:02 PM

I don't really care about the subcategories. I need the data in the main category (the ISBN giant) to edit the subcategory's description page and nothing in the subcategory.

Mpaa added a comment.May 31 2018, 7:11 PM

Exactly, then you need to use -subcats: option to get the Category pages inside the main category.

python scripts/listpages.py -lang:commons -family:commons -subcats:"Arthrocereus" -format:5
   1 Category:Arthrocereus by country        
   2 Category:Arthrocereus glaziovii         
   3 Category:Arthrocereus melanurus         
   4 Category:Arthrocereus rondonianus       
   5 Category:Arthrocereus spinosissimus     
5 page(s) found
Vvjjkkii renamed this task from API claims query is over 12mb but it is not to 3ybaaaaaaa.Jul 1 2018, 1:07 AM
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
Mpaa renamed this task from 3ybaaaaaaa to API claims query is over 12mb but it is not.Jul 1 2018, 6:59 PM
Mpaa lowered the priority of this task from High to Medium.
Mpaa updated the task description. (Show Details)
Mpaa added a subscriber: Aklapper.