Page MenuHomePhabricator

Allpages for File namespace doesn't return all results
Closed, ResolvedPublicBUG REPORT

Description

Running pywikibot.Site().allpages(namespace=6) return with only a subset of the results.
When running PageGenerator.result in _generators.py, it fails when trying to upcast from Page to FilePage partway through the list.

The reason seems to be that one of the files it encounters, File:Foo.txt is an extension that is no longer allowed. Although it suppresses the ValueError when doing the extension check in the __init__ function, in the update_page(p, pagedata, self.props) just below, it errors when checking if the file is a FilePage.

pywikibot 9.4.0

Event Timeline

@Prod: can you provide a command line which produces this issue?

I'm not sure about generating a command line script, but the code pywikibot.Site().allpages(namespace=6) generates the error from strategywiki.org when logged in. It should return 5000 results (per query), but instead it errors around 173.

I'm not sure about generating a command line script, but the code pywikibot.Site().allpages(namespace=6) generates the error from strategywiki.org when logged in. It should return 5000 results (per query), but instead it errors around 173.

Pretty sure the Files with wrong extensions are ignored. But unfortunately it seems that all files after them are also ignored instead of just skipped (or giving a Page object back).

Maybe the reason is that update_page can raise

raise RuntimeError(
    f'"imageinfo" found but {page} is not a FilePage object').

The file extension is invalid but here is an imageinfo given.

Xqt triaged this task as High priority.Nov 18 2024, 4:46 PM
Xqt changed the task status from Open to In Progress.Nov 19 2024, 8:34 AM
Xqt claimed this task.

Change #1092833 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@master] [bugfix] Upcast to FilePage in PageGenerator.result()

https://gerrit.wikimedia.org/r/1092833

I tested this change and my script did complete, thanks! I didn't validate how the files with invalid extensions were handled, but it's a minor concern.

Change #1092833 merged by jenkins-bot:

[pywikibot/core@master] [bugfix] Upcast to FilePage in PageGenerator.result()

https://gerrit.wikimedia.org/r/1092833