Page MenuHomePhabricator

Duplicate normalized triples for values
Closed, ResolvedPublic

Description

Querying this:

SELECT * WHERE { wdv:803274590d9be02196e329f69e494ec0 ?x ?y }

Produced two normalized triples in the result:

wdv:973b1e4a8a9d66746ee0501e5c1ccb80 and wdv:370634d61155304ed8adc9b0be5172cd

This is not normal - there should be only one normalized value. We should check where it comes from and remove bad values - also for other values with duplicate normalized values.

Related Objects

Event Timeline

Smalyshev created this task.Jan 8 2019, 7:41 PM
Restricted Application added a project: Wikidata. · View Herald TranscriptJan 8 2019, 7:41 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

wdv:973b1e4a8a9d66746ee0501e5c1ccb80 has 48.768000000000001 as a quantity. Looks like one of the "excessive precision" instances.

Addshore moved this task from incoming to monitoring on the Wikidata board.Jan 11 2019, 10:09 AM

This seems to be a consequence of T167759: Reference hash is not stable - a lot of duplicates can be seen in:

SELECT * WHERE { 
  ?x wikibase:quantityNormalized ?v1 .
  ?x wikibase:quantityNormalized ?v2 .
  ?v1 wikibase:quantityAmount ?q1 .
  ?v2 wikibase:quantityAmount ?q2 .
  FILTER(?v1 != ?v2) .
  FILTER(?q1 = ?q2) .
} LIMIT 200

produces a lot of values which have same amounts but different hashes. I am not sure what to do with it or why it happens.

@Addshore @WMDE-leszek @thiemowmde do you have any idea why hashes are changing again? Can we do anything about it - it produces broken data now and I don't see any way to fix it on WDQS side.

Smalyshev triaged this task as Medium priority.Jan 12 2019, 9:59 AM

I've cleaned up some of the bad values (looks like may be a fallout from float precision issue) but there are still a bunch of values with duplicate normalized values (as above) - 17398 to be exact.

Smalyshev moved this task from Backlog to Doing on the User-Smalyshev board.Jan 14 2019, 6:03 PM
Smalyshev moved this task from Doing to Done on the User-Smalyshev board.Jan 15 2019, 8:37 AM

I've cleaned up the bad values, but still unclear how they came into being and how to prevent this from happening again in the future.

Smalyshev closed this task as Resolved.Jan 23 2019, 11:49 PM
Smalyshev claimed this task.

Closing as resolved for now, since there's not much to do, most likely solving T167759 is needed to prevent re-occurrences.