Page MenuHomePhabricator

File revision contains no timestamp
Closed, ResolvedPublic

Description

Parsing a pywikibot generator over the pages referring [[:c:Template:Monument istoric]] started crashing when parsing the file https://commons.wikimedia.org/wiki/File:CATEDRALA_EVANGHELICA_SIBIU.jpg

The contents of the file_revision variable as parsed by PWB is: {'descriptionurl': 'https://commons.wikimedia.org/wiki/File:CATEDRALA_EVANGHELICA_SIBIU.jpg', 'filemissing': '', 'descriptionshorturl': 'https://commons.wikimedia.org/w/index.php?curid=16462988'}

The backtrace I get is:

Traceback (most recent call last):
  File "pwb.py", line 321, in <module>
    if not main():
  File "pwb.py", line 316, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File "pwb.py", line 101, in run_python_file
    main_mod.__dict__)
  File "./wikiro/robots/python/pywikipedia/monumente/parse_monument_article.py", line 708, in <module>
    main()
  File "./wikiro/robots/python/pywikipedia/monumente/parse_monument_article.py", line 635, in main
    for page in pregenerator:
  File "/home/andrei/pywikibot-core/pywikibot/pagegenerators.py", line 2219, in PreloadingGenerator
    for page in generator:
  File "/home/andrei/pywikibot-core/pywikibot/pagegenerators.py", line 1738, in NamespaceFilterPageGenerator
    for page in generator:
  File "/home/andrei/pywikibot-core/pywikibot/tools/__init__.py", line 1159, in filter_unique
    for item in iterable:
  File "/home/andrei/pywikibot-core/pywikibot/data/api.py", line 2824, in __iter__
    for result in self._extract_results(resultdata):
  File "/home/andrei/pywikibot-core/pywikibot/data/api.py", line 2773, in _extract_results
    result = self.result(item)
  File "/home/andrei/pywikibot-core/pywikibot/data/api.py", line 2925, in result
    update_page(p, pagedata, self.props)
  File "/home/andrei/pywikibot-core/pywikibot/data/api.py", line 3328, in update_page
    page._load_file_revisions(pagedict['imageinfo'])
  File "/home/andrei/pywikibot-core/pywikibot/page.py", line 2483, in _load_file_revisions
    file_revision = FileInfo(file_rev)
  File "/home/andrei/pywikibot-core/pywikibot/page.py", line 5696, in __init__
    self.timestamp = pywikibot.Timestamp.fromISOformat(self.timestamp)
AttributeError: 'FileInfo' object has no attribute 'timestamp'
CRITICAL: Exiting due to uncaught exception <class 'AttributeError'>

This is caused by the weird first version of the file which has no preview and has likely been uncovered by the fix in T233392. A fix in PWB should probably include ignoring revisions containing the 'filemissing' flag (which is undocumented - see T109125) but seems to indicate a problem with the image.

However, I believe the MediaWiki team should also take a look at why that version is missing an image - @Aklapper , could you please add the relevant people in CC?

Event Timeline

Ah, I think I found the original issue: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/533482/ (T221812)

Will claim the task and try to provide a fix in PWB.

Change 539357 had a related patch set uploaded (by Strainu; owner: Strainu):
[pywikibot/core@master] FilePage: Ignore revision with 'filemissing' field

https://gerrit.wikimedia.org/r/539357

Xqt triaged this task as High priority.Sep 26 2019, 4:17 PM

Change 539357 merged by jenkins-bot:
[pywikibot/core@master] FilePage: Ignore revision with 'filemissing' field

https://gerrit.wikimedia.org/r/539357