Page MenuHomePhabricator

Pywikibot throws an error when mediainfo of a file doesn't exist
Closed, ResolvedPublic

Description

This phabricator task for fixing two separate bugs in handling commons files where mediainfo doesn't exist.these.

Bug 1 is that there is no handling for missing mediainfo. Also there is no method for creating empty mediainfo.

Code

gen = pagegenerators.RecentChangesPageGenerator(
    site=self.site,
    namespaces=[6],  # File namespace
    changetype="new",
    total=100
)                
        
"""Seek to first page without mediainfo."""
for page in gen:
    if not 'mediainfo' in page.latest_revision.slots:
        item = page.data_item()    
        """Get fails as there is no mediainfo."""
        item.get()

Result

Traceback (most recent call last):
  File "/Users/kimmovirtanen/wikitech/core/tests/file_tests.py", line 400, in test_file_exist_but_without_item
    item.get()
  File "/Users/kimmovirtanen/wikitech/core/pywikibot/page/_wikibase.py", line 427, in get
    data = self.file.latest_revision.slots['mediainfo']['*']
KeyError: 'mediainfo'

Fix

Handle the missing key as NoWikibaseEntityError and creating get_data_for_new_entity() if user wants to create new item for the file.

Bug 2 (T222159) is that the empty statements is list instead of dictionary
Code

gen = pagegenerators.RandomPageGenerator(total=1000, site=site, namespaces=[6])  # Namespace 6 corresponds to files

"""Seek to first pagewith  mediainfo."""
for page in gen:
    if 'mediainfo' in page.latest_revision.slots:
        item = page.data_item()
        """Get fails in first item where is no statements in mediainfo."""
        data=item.get()

Result

Traceback (most recent call last):
  File "/Users/kimmovirtanen/pywikibot/latestfiles.py", line 23, in <module>
    data=item.get()
  File "/Users/kimmovirtanen/pywikibot/venv/lib/python3.10/site-packages/pywikibot/page/_wikibase.py", line 446, in get
    return super().get(force=force)
  File "/Users/kimmovirtanen/pywikibot/venv/lib/python3.10/site-packages/pywikibot/page/_wikibase.py", line 275, in get
    value = cls.fromJSON(self._content.get(key, {}), self.repo)
  File "/Users/kimmovirtanen/pywikibot/venv/lib/python3.10/site-packages/pywikibot/page/_collections.py", line 213, in fromJSON
    for key, claims in data.items():
AttributeError: 'list' object has no attribute 'items'
CRITICAL: Exiting due to uncaught exception AttributeError: 'list' object has no attribute 'items'

Fix

Pywikibot fix is todetect incorrect list and convert it to dictionary when data is loaded.

Howto test if it is working

import pywikibot
site = pywikibot.Site('commons', 'commons')
page = pywikibot.FilePage(site,'Image:Montemurro1857.png')
item = page.data_item()
data=item.get()
print(data)

Event Timeline

Zache updated the task description. (Show Details)
Zache updated the task description. (Show Details)
Zache renamed this task from Pywikibot fails if mediainfo doesn't exists for the file to Pywikibot throws an error when mediainfo of a file doesn't exist.Aug 27 2023, 10:20 AM
Zache updated the task description. (Show Details)
Zache updated the task description. (Show Details)

Change 952553 had a related patch set uploaded (by Zache-tool; author: Zache-tool):

[pywikibot/core@master] bugfix for T345038. Test if mediainfo exists. Added get_data_for_new_entity() for creating empty mediainfo.

https://gerrit.wikimedia.org/r/952553

Change 952553 had a related patch set uploaded (by Zache-tool; author: Zache-tool):

[pywikibot/core@master] [bugfix] Handle missing SDC mediainfo

https://gerrit.wikimedia.org/r/952553

Change 952553 merged by jenkins-bot:

[pywikibot/core@master] [bugfix] Handle missing SDC mediainfo

https://gerrit.wikimedia.org/r/952553

For future reference: if SDC data doesn't exist following sequence throws exception:

wditem = page.data_item()
sdcdata = wditem.get()

With the fixed version of pywikibot, it is now possible to add SDC data if it is missing:

wditem = page.data_item()
sdcdata = wditem.get_data_for_new_entity()

.. add data ..

wditem.get() will still throw exception unless get_data_for_new_entity() has been called first so code needs to check for that (when to add new).