Page MenuHomePhabricator

Pywikibot cannot fetch some wikidata items.
Open, LowPublicBUG REPORT

Description

Pywikibot fails to fetch some wikidata items.

Steps to Reproduce:

test code

import pywikibot

def test1():
    site = pywikibot.getSite('en', 'wikipedia')
    page = pywikibot.Page(site, 'Western Sahara')
    item = pywikibot.ItemPage.fromPage(page)

    item.get()

def test2():
    site = pywikibot.getSite('en', 'wikipedia')
    repo = site.data_repository()
    item = pywikibot.ItemPage(repo, 'Q6250')

    item.get()

    print(item)

test1()
test2()

test1() and test2() both fails.

pwb version.py

Pywikibot: [https] r-pywikibot-core.git (3023cea, g12571, 2020/07/02, 17:58:26, n/a)
Release version: 3.1.dev0
requests version: 2.21.0
  cacerts: /mnt/nfs/labstore-secondary-tools-project/chobot/pwb/lib/python3.5/site-packages/certifi/cacert.pem
    certificate test: ok
Python: 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516]
Toolforge hostname: tools-sgebastion-07
PYWIKIBOT_DIR: .pywikibot

Actual Results:

Traceback (most recent call last):
  File "/data/project/chobot/src/pywikibot-core.new/pwb.py", line 379, in <module>
    if not main():
  File "/data/project/chobot/src/pywikibot-core.new/pwb.py", line 374, in main
    file_package)
  File "/data/project/chobot/src/pywikibot-core.new/pwb.py", line 106, in run_python_file
    main_mod.__dict__)
  File "wikidata_test.py", line 22, in <module>
    test2()
  File "wikidata_test.py", line 18, in test2
    item.get()
  File "/mnt/nfs/labstore-secondary-tools-project/chobot/src/pywikibot-core.new/pywikibot/page/__init__.py", line 4915, in get
    data = super(ItemPage, self).get(force, *args, **kwargs)
  File "/mnt/nfs/labstore-secondary-tools-project/chobot/src/pywikibot-core.new/pywikibot/page/__init__.py", line 4514, in get
    data = WikibaseEntity.get(self, force=force)
  File "/mnt/nfs/labstore-secondary-tools-project/chobot/src/pywikibot-core.new/pywikibot/page/__init__.py", line 4283, in get
    value = cls.fromJSON(self._content.get(key, {}), self.repo)
  File "/mnt/nfs/labstore-secondary-tools-project/chobot/src/pywikibot-core.new/pywikibot/page/__init__.py", line 3852, in fromJSON
    this[key] = [Claim.fromJSON(repo, claim) for claim in claims]
  File "/mnt/nfs/labstore-secondary-tools-project/chobot/src/pywikibot-core.new/pywikibot/page/__init__.py", line 3852, in <listcomp>
    this[key] = [Claim.fromJSON(repo, claim) for claim in claims]
  File "/mnt/nfs/labstore-secondary-tools-project/chobot/src/pywikibot-core.new/pywikibot/page/__init__.py", line 5445, in fromJSON
    claim.type, lambda value, site: value)(value, site)
  File "/mnt/nfs/labstore-secondary-tools-project/chobot/src/pywikibot-core.new/pywikibot/__init__.py", line 1051, in fromWikibase
    return cls(page, site)
  File "/mnt/nfs/labstore-secondary-tools-project/chobot/src/pywikibot-core.new/pywikibot/__init__.py", line 1022, in __init__
    specifics['ending'], specifics['label'])
  File "/mnt/nfs/labstore-secondary-tools-project/chobot/src/pywikibot-core.new/pywikibot/__init__.py", line 989, in _validate
    raise ValueError('Page must exist.')
ValueError: Page must exist.
CRITICAL: Exiting due to uncaught exception <class 'ValueError'>

Expected Results:

[[wikidata:Q6250]]

Event Timeline

ChongDae created this task.Jul 3 2020, 2:27 AM
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptJul 3 2020, 2:27 AM
ChongDae updated the task description. (Show Details)Jul 3 2020, 2:52 AM
ChongDae updated the task description. (Show Details)
ChongDae updated the task description. (Show Details)
Xqt triaged this task as High priority.Jul 3 2020, 5:03 AM
Xqt added a subscriber: Xqt.

A more informative result traceback after adding some hints to the code:

>>> import pwb, pywikibot
>>> site = pywikibot.Site('en', 'wikipedia')
>>> repo = site.data_repository()
>>> item = pywikibot.ItemPage(repo, 'Q6250')
>>> item
ItemPage('Q6250')
>>> item.exists()
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in <module>
    item.exists()
  File "C:\pwb\GIT\core\pywikibot\page\__init__.py", line 4473, in exists
    self.get(get_redirect=True)
  File "C:\pwb\GIT\core\pywikibot\page\__init__.py", line 4915, in get
    data = super(ItemPage, self).get(force, *args, **kwargs)
  File "C:\pwb\GIT\core\pywikibot\page\__init__.py", line 4514, in get
    data = WikibaseEntity.get(self, force=force)
  File "C:\pwb\GIT\core\pywikibot\page\__init__.py", line 4283, in get
    value = cls.fromJSON(self._content.get(key, {}), self.repo)
  File "C:\pwb\GIT\core\pywikibot\page\__init__.py", line 3852, in fromJSON
    this[key] = [Claim.fromJSON(repo, claim) for claim in claims]
  File "C:\pwb\GIT\core\pywikibot\page\__init__.py", line 3852, in <listcomp>
    this[key] = [Claim.fromJSON(repo, claim) for claim in claims]
  File "C:\pwb\GIT\core\pywikibot\page\__init__.py", line 5444, in fromJSON
    claim.target = cls.TARGET_CONVERTER.get(
  File "C:\pwb\GIT\core\pywikibot\__init__.py", line 1053, in fromWikibase
    return cls(page, site)
  File "C:\pwb\GIT\core\pywikibot\__init__.py", line 1023, in __init__
    _WbDataPage._validate(page, specifics['data_site'],
  File "C:\pwb\GIT\core\pywikibot\__init__.py", line 991, in _validate
    raise ValueError('Page {} must exist.'.format(page))
ValueError: Page [[commons:Data:Western Sahara.map]] must exist.
>>>

Change 609280 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [IMPR] Show additional hints with ValueErrors in _WbDataPage._validate

https://gerrit.wikimedia.org/r/609280

Xqt lowered the priority of this task from High to Low.EditedJul 3 2020, 5:18 AM

Other items have similar problems.

eg

Q16645 - United States Minor Outlying Islands

Xqt added a subscriber: Multichill.Jul 3 2020, 5:59 AM

Change 609280 merged by jenkins-bot:
[pywikibot/core@master] [IMPR] Show additional hints with ValueErrors in _WbDataPage._validate

https://gerrit.wikimedia.org/r/609280

Xqt added a comment.EditedJul 3 2020, 6:15 AM

Other items have similar problems.

Ok we could ignore the existance check of these properies and just print a warning. I've no idea whether this has other impacts then.

@Lokal_Profil: As you've added this test for validating existance in _WbDataPage._validate(): Is this really necessary or would a warning be enough?

I think a warning should do it and enables a bot script to be designed to fix these issues if a pages was moved or deleted.

This is duplicate of my T249692, which way should we merge?

Xqt added a comment.Jul 3 2020, 8:40 AM

This is duplicate of my T249692, which way should we merge?

I merged it here by accident. On the other hand a patch of ValueError message is related to this.

The problem here is this validation happens both when deserializing and in custom code. When we finally refactor things (T186200), we may think about avoiding it during deserialization but keeping it in Claim.setTarget. Note that we don't validate e.g. existence of items or files, so we could also drop this validation and document that it should be the user who validates the input.