Page MenuHomePhabricator

Imageinfo are not fully set by pagegenerators
Closed, ResolvedPublic

Description

Steps to reproduce:

site = pywikibot.Site()
page = next(pagegenerators.AllpagesPageGenerator(namespace='File'))
print(page.latest_file_info.mime

I tested it on our private mediawiki installation (1.39.6), but should be reproducible on any family/language. I'm using pywikibot 9.0.0 from PyPi.

Experienced output:

Traceback (most recent call last):
  File "/home/test.py", line 223, in <module>
    print(page.latest_file_info.mime)
AttributeError: 'FileInfo' object has no attribute 'mime'
CRITICAL: Exiting due to uncaught exception AttributeError: 'FileInfo' object has no attribute 'mime'

Expected output:
Returns the MIME-Type

Analysis:
When loading file pages through a generator, attribute FilePage._file_revisions is already set and thus loadimageinfo does not get triggered for the FilePage. The attribute gets set since the PageGenerator object loads the imageinfo prop, but does not load all the attributes that loadimageinfo does. Thus, e.g. the MIME-type are not queried.

Suggested fix:
Ensure that whenever imageinfos are loaded, the same set of properties are loaded.
I. e. in __init__ of class PageGenerator (file data/api/_generators.py lines 707 & 708), change

append_params(parameters, 'iiprop', 'timestamp|user|comment|url|size|sha1')

to

append_params(parameters, 'iiprop', 'timestamp|user|comment|url|size|sha1|mime|mediatype|archivename|bitdepth')

If I apply this fix, my initial code snippet (and my application) work as expected.

Current workaround:
You can add the required fields manually to the loaded properties, this works for me:

site = pywikibot.Site()
gen = pagegenerators.AllpagesPageGenerator(namespace='File')
gen.request['iiprop'].append('mime')
page = next(gen)
print(page.latest_file_info.mime

Cheers, thanks for the good work!

Event Timeline

Xqt triaged this task as High priority.Mar 24 2024, 12:44 PM
Xqt subscribed.
This comment was removed by Xqt.

As a workaraound you can call metadata first:

site = pywikibot.Site()
page = next(pagegenerators.AllpagesPageGenerator(namespace='File'))
_ = page.latest_file_info.metadata
print(page.latest_file_info.mime)

Change #1013713 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@master] [bugfix] Use the same iiprop in PageGenerator as in APISite.loadimageinfo()

https://gerrit.wikimedia.org/r/1013713

Thanks @Xqt for the fix!

Sidenote: While your suggested workaround does not mess with the internals of the class as mine does, it invalidates the whole point of generators, i.e. that not a request for each of the pages separately but in batches. Calling metadata on each page separately thus yields muuuuch longer delays in my application and is not useful for me. Looking forward to the fix making it to the release channel :-)

Thanks @Xqt for the fix!

Sidenote: …

You are right but the workaround was intended to be used wihout changing the framework.

Change #1013713 merged by jenkins-bot:

[pywikibot/core@master] [bugfix] Use the same iiprop in PageGenerator as in APISite.loadimageinfo()

https://gerrit.wikimedia.org/r/1013713