Page MenuHomePhabricator

Wikidata allows invalid URIs to be entered as units
Open, HighPublic

Description

Wikidata allows URI which are not valid entities to be entered as units. E.g., in this change:
https://www.wikidata.org/w/index.php?title=Q420481&diff=prev&oldid=494773425

http://www.wikidata.org/entity/1 is entered as unit.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 10 2017, 6:24 AM
Lydia_Pintscher triaged this task as High priority.Jun 11 2017, 4:28 PM
Lydia_Pintscher added subscribers: daniel, thiemowmde.
Smalyshev updated the task description. (Show Details)Jun 11 2017, 7:39 PM

Got another series of the same, e.g. https://www.wikidata.org/w/index.php?title=Q961&diff=501476468&oldid=492032927 and many other edits by ShinePhantom. Looks like some broken script may be out there. @Ladsgroup, does pywikibot have checks for this? If not, maybe we need to add some.

thiemowmde added a subscriber: Lydia_Pintscher.

The relevant validation currently done in ValidatorBuilders.php is a substring match for http://www.wikidata.org/entity/. This already disallows all …/wiki/ URLs. Namespace, entity type, and entity ID are currently not validated.

It should not be that hard to create a validator that only accepts a single entity type (or a set of entity types), checks the namespace (note that items can be in the main namespace, or in an "Item:" namespace), parses the entity ID, and makes sure it matches the entity type. Service classes for all these individual checks should already exist (probably EntityNamespaceLookup and an EntityIdParser).

Note that calendar model and globe URIs are missing the exact same validation. It's probably a good idea to have a single ticket for all three.

Restricted Application added a subscriber: PokestarFan. · View Herald TranscriptJul 24 2017, 2:21 PM

(I missed this ping for some reason)

@Ladsgroup, does pywikibot have checks for this? If not, maybe we need to add some.

I don't think so, we might be able to do so but IMO, there should be other validators inside Wikibase too.

Another example here:

https://www.wikidata.org/w/index.php?title=Q4679732&diff=552439126&oldid=552439107

http://www.wikidata.org/entity/undefined is entered as unit.

Another example here:
https://www.wikidata.org/w/index.php?title=Q4679732&diff=552439126&oldid=552439107
http://www.wikidata.org/entity/undefined is entered as unit.

Looks like this one was imported with HarvestTemplates (run by @Pasleim), maybe he or @matej_suchanek can see how to prevent this kind of invalid imports in the future.

abian added a subscriber: abian.Sep 23 2018, 11:30 AM

Mentioned again in this discussion, it appears to be the same issue.

Note that per the data model specification, the unit can be any URI (or rather, any IRI):

The unit specifies a physical quantity that the number refers to. It is represented as a IRI rather than as a String, since a string like "m" might represent different units in different contexts. The value should be meaningful independently of the declaration information for its Property (from which more details about units could possibly be obtained), hence the unit is a full IRI. In practice, this IRI might be the IRI refering to an ItemDescription representing the desired unit, or be taken from a standard vocabulary for units, like QUDT.

So any validation that restricts this to Wikidata entities should not be hard coded, since other Wikibase installations may choose to rely on a different vocabulary for units.