Page MenuHomePhabricator

Blazegraph and MariaDB contain different sitelinks at Wikidata
Closed, ResolvedPublic3 Estimated Story Points

Description

I looked for true duplicates at Wikidata using the query

SELECT DISTINCT ?item1 ?item2 ?sitelink WHERE {
  ?sitelink schema:about ?item1, ?item2.
  ?sitelink schema:isPartOf <https://sv.wikipedia.org/>.
  filter(?item1 != ?item2)
}

where I changed 'sv' for different languages. ​I've found a few cases where the values of sitelinks were different in Wikidata GUI and SPARQL endpoint.
The SPARQL query says that Q19929406 contains https://sv.wikipedia.org/wiki/Ljudniv%C3%A5. But this is not the case at Q19929406. Also Q6766777 has https://en.wikipedia.org/wiki/Mark_Bonner in Blazegraph, but not in Wikidata GUI.

Probably there are other cases like that. It seems the wdqs-updater for some reason has not caught those changes.

Event Timeline

Thanks for the report this is very helpful.
The two updates you mention here were missed by the new updater but both of these were properly identified as problematic and will be resolved once we have the reconciliation strategy (work tracked in T279541)

For the record here are the notes regarding these two missed updates:

edit timeitemwikibase truthold updater wdqs1010wdqs eqiad wdqs1009wdqs codfw wdqs2008in revision-create topicin mutation topicin fetch-failure
2021-10-11T17:11:00‎Q67667771510811138151081113813926845851510811138yescodfw onlynone (only in raw for eqiad T294361)
2021-10-17T6:19:36Q199294061512982605deleted15124538681512453868yesnoeqiad&codfw

I'm moving this to waiting while T279541 is being worked out so that we have a place to record future inconsistencies.

The reconciliation process is running and should auto-correct missed updates couple hours after they're performed.
I also fixed the inconsistencies listed here and other related tickets. Please let me know if you still find errors.