Page MenuHomePhabricator

Duplicated statement ids in some wikidata entities
Closed, DuplicatePublic

Description

Identified in https://www.wikidata.org/wiki/Wikidata:Report_a_technical_problem#Non-unique_statement_id_in_Q85046372:

According to Wikidata documentation: [stmt_id is] An arbitrary identifier for the Statement, which is unique across the repository.
But going to the Wikidata webpage for Secondary limb lymphedema (Q85046372) and looking in the page source, we can see that Q85046372$70E829CD-2D80-48D1-BB71-8EE2B5C22051 is referenced twice, every time with a different underlying data.

Unlike T356161 the duplicated statement ids are found always in the same entity, trying to detect more of them in the RDF dumps in hadoop we can find 2 instances of this issue:

  • Q34433114 with Q34433114-684DE268-387D-4E42-8BD8-394C5C36D10C
  • Q85046372 with Q85046372-70E829CD-2D80-48D1-BB71-8EE2B5C22051

Note that the list above might be incomplete due to some deduplications that we perform while importing the dumps into hadoop.

This makes the RDF representation of these entities wrong by conflating two statements.

See also: T371464: Investigation: uniqueness of statement IDs within an entity