I am using the CategorizedPageGenerator method on a Wikimedia Commons categories to get a list of all its members (and, recursively, subcat members). I am using it on 'Category:Media_contributed_by_the_Digital_Public_Library_of_America' with the following code:
cat = pywikibot.Category(site, 'Category:Media_contributed_by_the_Digital_Public_Library_of_America') for file in pagegenerators.CategorizedPageGenerator(cat, recurse=True, namespaces='6'): [does stuff]
When I run this, I the following warning repeatedly:
WARNING: API warning (result): This result was truncated because it would otherwise be larger than the limit of 12,582,912 bytes.
I am afraid this means I cannot ever access the full results set, and, presumably, anyone trying to use page generators for page sets that include large/many PDFs or DJVu files also will never be able to access all the pages. I see something similar has been reported at T195992, but that task is a bit confusing, because the reporter appears to have been trying to exclude files from the query anyway, and just wanted category names. I actually do want all files.
The discussion at T101400 is clarifying, since it seems the cause of this warning is likely that there can be a large amount of data returned when iiprop=metadata is requested for, for example, a PDF with a text layer—or, in the case of a Commons category, potentially 500 of them at once (I'm assuming it requests the max, by default?).
The problem is that, for my use case, I really just want page titles, but I guess since Pywikibot wants to generate all the page objects using all the metadata, there is no way around this error currently. Since T89971 has been around for years and appears stalled, I wonder if there is a way to solve this in Pywikibot. For example, if it receives this warning, could Pywikibot back up and use successive smaller gcmlimit (or whatever method it is using does) values until it gets under the 12MB response limitation? Or, could there be a filter option to turn off image metadata, if it is not actually necessary for the user's needs?