Maniphest T195992

API claims query is over 12mb but it is not
Open, MediumPublic
Actions

Assigned To

None

Authored By

	revi
	May 30 2018, 6:00 PM

Description

What did I do?

Did python pwb.py replace -lang:commons -family:commons -regex "ISBN\s+((97(8|9))?\s?-?([0-9]\s?-?){9}([0-9Xx]))([\D$])" "{{ISBN|\1}}\6" -summary:"[[User:Revibot/Task/4|bot]]) (Replace ISBN magic links" -always -cat:"Pages using ISBN magic links" -ns:14

Actual behavior:

WARNING: API warning (result): This result was truncated because it would otherwise be larger than the limit of 12,582,912 bytes.

Expected behavior

Page edited.

First item in the category in question is this. I edited manually, and it worked. Actually, that page was 910bytes.

NOTE: I don't know which is at fault here: replace.py or API itself, so I'm filling under both as projects. Please adjust as needed.

Related Objects

Mentioned In: T255981: Persistant error 500 getting category members
T253591: page generators can truncate responses when there is excessive metadata (e.g. DjVu/PDF files)
Mentioned Here: T253591: page generators can truncate responses when there is excessive metadata (e.g. DjVu/PDF files)
T101400: This result was truncated because it would otherwise be larger than the limit of 12,582,912 bytes

Event Timeline

revi created this task.May 30 2018, 6:00 PM

Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptMay 30 2018, 6:00 PM

revi moved this task from Incoming to Radar on the User-revi board.May 30 2018, 6:35 PM

revi added a subscriber: zhuyifei1999.

The warning here does not say the page is so large, but the search query (in your case -cat:"Pages using ISBN magic links") returns too much results (the results of the search query are too large) and the search query took too long so the API did not output you the full list of pages, but trimmed list instead

Dvorapa renamed this task from API claims page is over 12mb but it is not to API claims query is over 12mb but it is not.May 30 2018, 8:15 PM

I cannot explain the error, but I think no Category pages will be yielded (which would be a separate bug)

In:

Body: 'gcmtitle=Category%3APages+using+ISBN+magic+links&gcmprop=ids%7Ctitle%7Csortkey&gcmtype=page%7Cfile&prop=info%7Cimageinfo%7Ccategoryinfo&inprop=protection&iiprop=timestamp%7Cuser%7Ccomment%7Curl%7Csize%7Csha1%7Cmetadata&iilimit=max&generator=categorymembers&action=query&indexpageids=&continue=gcmcontinue%7C%7Cuserinfo&gcmlimit=500&meta=userinfo&uiprop=blockinfo%7Chasmsg&maxlag=5&format=json&gcmcontinue=file%7....35'

gcmtype=page%7Cfile

which means that only files or pages will be yielded, no subcats.

In categorymembers

cmtype

    Which type of category members to include. Ignored when cmsort=timestamp is set. 
    Values (separate with | or alternative): page, subcat, file
    Default: page|subcat|file

Try a small one: https://commons.wikimedia.org/wiki/Category:Arthrocereus
This category has the following 5 subcategories, out of 5 total.
This category contains only the following file.

user@pc:~/python/core {master}$ python scripts/listpages.py -lang:commons -family:commons -cat:"Arthrocereus"
   1 Arthrocereus HU 330.jpg
1 page(s) found
user@pc:~/python/core {master}$ python scripts/listpages.py -lang:commons -family:commons -cat:"Arthrocereus" -ns:14
0 page(s) found

Anomie moved this task from Unsorted to Non-core-API stuff on the MediaWiki-Action-API board.May 30 2018, 8:30 PM

Looks like a duplicate of T101400: This result was truncated because it would otherwise be larger than the limit of 12,582,912 bytes .

In T195992#4244251, @Anomie wrote:

Looks like a duplicate of T101400: This result was truncated because it would otherwise be larger than the limit of 12,582,912 bytes .

Indeed

In T195992#4244246, @Mpaa wrote:
Try a small one: https://commons.wikimedia.org/wiki/Category:Arthrocereus
This category has the following 5 subcategories, out of 5 total.
This category contains only the following file.
user@pc:~/python/core {master}$ python scripts/listpages.py -lang:commons -family:commons -cat:"Arthrocereus"
   1 Arthrocereus HU 330.jpg
1 page(s) found
user@pc:~/python/core {master}$ python scripts/listpages.py -lang:commons -family:commons -cat:"Arthrocereus" -ns:14
0 page(s) found

-subcats is the correct option to be used in such case.

I don't really care about the subcategories. I need the data in the main category (the ISBN giant) to edit the subcategory's description page and nothing in the subcategory.

Exactly, then you need to use -subcats: option to get the Category pages inside the main category.

python scripts/listpages.py -lang:commons -family:commons -subcats:"Arthrocereus" -format:5
   1 Category:Arthrocereus by country        
   2 Category:Arthrocereus glaziovii         
   3 Category:Arthrocereus melanurus         
   4 Category:Arthrocereus rondonianus       
   5 Category:Arthrocereus spinosissimus     
5 page(s) found

Dvorapa added a project: Pywikibot.Jun 6 2018, 1:16 AM

Framawiki subscribed.Jun 6 2018, 4:28 PM

• Vvjjkkii renamed this task from API claims query is over 12mb but it is not to 3ybaaaaaaa.Jul 1 2018, 1:07 AM

• Vvjjkkii raised the priority of this task from Medium to High.

• Vvjjkkii added projects: CheckUser, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), Tamil-Sites, Gamepress, Hashtags, Jade, KartoEditor, Language-2018-Apr-June, New-Editor-Experiences, Mail, TCB-Team (now WMDE-TechWish).

• Vvjjkkii updated the task description. (Show Details)

• Vvjjkkii removed a subscriber: Aklapper.

Mpaa renamed this task from 3ybaaaaaaa to API claims query is over 12mb but it is not.Jul 1 2018, 6:59 PM

Mpaa lowered the priority of this task from High to Medium.

Mpaa removed projects: TCB-Team (now WMDE-TechWish), Mail, New-Editor-Experiences, Language-2018-Apr-June, KartoEditor, Jade, Hashtags, Gamepress, Tamil-Sites, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), CheckUser.

Mpaa updated the task description. (Show Details)

Mpaa added a subscriber: Aklapper.

Dominicbm mentioned this in T253591: page generators can truncate responses when there is excessive metadata (e.g. DjVu/PDF files).May 25 2020, 9:14 PM

See T253591. Please retry with decreasing the query increment. By default the maximum query increment of data is retrieved which depends on the membership of user groups. You may decrease the maximum query increment by the step parameter in your user-config.py; this parameter is also available as global -step option if your script calls pywikibot.handle_args() for all command line options refer the basic.py sample script for that.

Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptMay 26 2020, 7:53 AM

matej_suchanek mentioned this in T255981: Persistant error 500 getting category members.Jul 16 2020, 3:42 PM

Aklapper removed a subscriber: Anomie.Oct 16 2020, 5:02 PM

API claims query is over 12mb but it is notOpen, MediumPublicActions