Page MenuHomePhabricator

Deleted item still gets shown in WDQS query results
Closed, ResolvedPublic

Description

Q104773323 has been deleted on Wikidata, but keeps popping up when querying for P2397. I already tried restoring and redeleting the item, but it didn't help.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I think this king of inconsistencies were related to the problems reported in T267175.
Thanks for the report, I manually resynced but please let me know via this ticket or T267175 if you see other stale entities.

Re-opening as there still seem to be a problem related to deletes, and the fix done in T267175 was not effective.

All servers seem to have missed the deletion of Q104982840.
Last edit for revision 1348777513 was done on 2021-01-26T18:33:39Z, delete was done at 2021-01-27T07:48:16Z.

Note that the server running the new updater has properly handled the delete.

Change 659323 had a related patch set uploaded (by DCausse; owner: DCausse):
[wikidata/query/rdf@master] Log non revision-create-events

https://gerrit.wikimedia.org/r/659323

Another deletion that still shows up: Q105098078
It was deleted January 29th at 8:55 AM CET (= 7:55 AM UTC)

I should note that this has been happening for a while with a number of lexemes as well (such as L401588).

Change 659323 merged by jenkins-bot:
[wikidata/query/rdf@master] Log non revision-create-events

https://gerrit.wikimedia.org/r/659323

Data reload should start tomorrow (T267927) and should fix all the currently missed deletes. The underlying issue still has not been identified, but the new updater (T244590) does not have the same problem. Since it should be ready soon, it make sense to not invest more time in fixing the old updater. Side note: deletes are rare (132 in 2 days), the risk of major data issues is low.

Gehel claimed this task.

Closing this as the only work remaining is the data reload tracked in T267927

FTR, deletes will be re-synced regularly using an ad-hoc script available at https://people.wikimedia.org/~dcausse/wdqs_manual_deletes/.
This will be done the time needed to ship the new updater to production or if the root cause on the current updater is found and fixed.

@dcausse would you re-run the script this week? We deleted a series of items last weekend on WD

Deleted Q-IDs (e.g. Q107166925) might still exist in the VIAF entries, for example

https://viaf.org/viaf/80232010/

How can VIAF be informed about deleted resp. changed Q-IDs? (new ID is Q107986623 for the given sample)

Thanks a lot!

@Esc3300 , @d.causse is currently out of office, and will not be able to run the script this week. The Search team is currently working on finalizing the new Streaming Updater as a longer term solution, with rollout planned for end of Sept.