Page MenuHomePhabricator

editEntity fails if there is an existing coordinates claim with no precision value
Closed, ResolvedPublic

Description

Kept encountered this while running my bot and took me a while to track down the cause. If you edit a Wikidata item locally (e.g. setTarget for an arbitrary claim on the item), and then call method item.editEntity(summary='edited arbitrary claim') you will get the following error if there's an existing coordinates claim on the item with no precision specified.

[messages: [{'name': 'wikibase-validator-missing-field', 'parameters': ['precision'], 'html': {'*': 'Missing required field "precision"'}}]

Example items where this is true:

Q49208
Q465071
Q317032
Q374058

Event Timeline

matej_suchanek triaged this task as High priority.EditedFeb 28 2020, 10:18 AM
matej_suchanek added a subscriber: matej_suchanek.

Yes, this is annoying. There are multiple issues:

  • Wikidata apparently still stores the invalid data (but does not allow to re-submit it)
  • Pywikibot doesn't do well when deciding which claims were changed

Let's debug it on a random example from my logs: https://www.wikidata.org/wiki/Q486235

When ItemPage.toJSON with diffto is called, the following is run:

claims = {}
for prop in self.claims:
    if len(self.claims[prop]) > 0:
        claims[prop] = [claim.toJSON() for claim in self.claims[prop]]

if diffto and 'claims' in diffto:
    temp = defaultdict(list)
    claim_ids = set()

    diffto_claims = diffto['claims']

    for prop in claims:
        for claim in claims[prop]:
            if (prop not in diffto_claims
                    or claim not in diffto_claims[prop]):  # <- this is the key
                temp[prop].append(claim)

            if 'id' in claim:
                claim_ids.add(claim['id'])

What does claim not in diffto_claims[prop]? It checks whether the JSON of each claim is present in what we are diffing against (the original entity content when it was loaded). This way it can catch if the claim was modified locally (eg. by calling setTarget). If a claim hasn't been changed locally, there is no point in submitting it. So why are coordinates submitted even if we didn't change them?

{'mainsnak': {'snaktype': 'value', 'property': 'P625', 'datatype': 'globe-coordinate', 'datavalue': {'value': {'latitude': 27.72334, 'longitude': 109.18851, 'altitude': None, 'globe': 'http://www.wikidata.org/entity/Q2', 'precision': None}, 'type': 'globecoordinate'}}, 'type': 'statement', 'id': 'q486235$1280901A-091B-4374-9E3E-66CF41212194', 'rank': 'normal'}

{'mainsnak': {'snaktype': 'value', 'property': 'P625', 'hash': '01e88adb61cc5b89e157d42d972804b49e55877b', 'datavalue': {'value': {'latitude': 27.72334, 'longitude': 109.18851, 'altitude': None, 'precision': None, 'globe': 'http://www.wikidata.org/entity/Q2'}, 'type': 'globecoordinate'}, 'datatype': 'globe-coordinate'}, 'type': 'statement', 'id': 'q486235$1280901A-091B-4374-9E3E-66CF41212194', 'rank': 'normal'}

Current and original JSONs are always different. Here the only difference is the hash key. But for other datatypes, eg. wikibase-item...

{'mainsnak': {'snaktype': 'value', 'property': 'P17', 'datatype': 'wikibase-item', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 148}, 'type': 'wikibase-entityid'}}, 'type': 'statement', 'id': 'q486235$8B49EBB5-04AD-480A-A1F1-764B5018D076', 'rank': 'normal', 'references': [{'snaks': {'P143': [{'snaktype': 'value', 'property': 'P143', 'datatype': 'wikibase-item', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 30239}, 'type': 'wikibase-entityid'}}]}, 'snaks-order': ['P143'], 'hash': '0ee3b3ba1c958f4c3dcba7ed8091fe4b57311348'}]}

{'mainsnak': {'snaktype': 'value', 'property': 'P17', 'hash': '30e172796b0726589e92b001c327f5d55fa0782e', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 148, 'id': 'Q148'}, 'type': 'wikibase-entityid'}, 'datatype': 'wikibase-item'}, 'type': 'statement', 'id': 'q486235$8B49EBB5-04AD-480A-A1F1-764B5018D076', 'rank': 'normal', 'references': [{'hash': '0ee3b3ba1c958f4c3dcba7ed8091fe4b57311348', 'snaks': {'P143': [{'snaktype': 'value', 'property': 'P143', 'hash': 'cb49f6fa327b245e4a5aaf48c44b3f503bcd4265', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 30239, 'id': 'Q30239'}, 'type': 'wikibase-entityid'}, 'datatype': 'wikibase-item'}]}, 'snaks-order': ['P143']}]}

... even more differences. So although comparing serializations is very efficient, it's currently broken and also not future-proof. This also means bots send excessive amounts of data even for tiny changes when the operator lets Pywikibot decide.

A promising solution is to not operate on serialized claims and use T76615: Claim equality operator (it doesn't compare references but T186200#4267477 suggests a way around).

Change 575548 had a related patch set uploaded (by Matěj Suchánek; owner: Matěj Suchánek):
[pywikibot/core@master] [IMPR] Fix search for changed claims when saving entity

https://gerrit.wikimedia.org/r/575548

Change 575548 merged by jenkins-bot:
[pywikibot/core@master] [IMPR] Fix search for changed claims when saving entity

https://gerrit.wikimedia.org/r/575548

Xqt claimed this task.
Xqt reassigned this task from Xqt to matej_suchanek.
Xqt added a subscriber: Xqt.