Page MenuHomePhabricator

Deletion of Lexemes appears to leak triples related to its forms and senses
Open, MediumPublic3 Estimated Story Points

Description

In T302189 it was reported:

This report of grammatical features is wrong because it includes deleted data. Like with the previous queries I mentioned, I'm unable to fix it because that takes it from running in under a second to timing out.

This query returns a form which was deleted 11 months ago.

(Here's 100 forms which need cleaning up)

It suggests that forms of deleted lexemes are leaked in the triple store.

Triples whose subjects have a form that is attached to a deleted Lexeme using ontolex:lexicalForm should be deleted as well.
It might be possible that senses are leaked too (attached with ontolex:sense).

AC:

  • when a Lexeme is deleted all its forms and senses should be removed from WDQS.

Event Timeline

Gehel triaged this task as High priority.Jan 9 2023, 4:23 PM
Gehel moved this task from Incoming to Tech Debt on the Wikidata-Query-Service board.
dr0ptp4kt lowered the priority of this task from High to Medium.Oct 1 2024, 3:26 PM
dr0ptp4kt set the point value for this task to 3.

First step is investigating to understand if we are missing events (which happens) or if there is a bug in the updater code.

Change #1104714 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/rdf@master] Properly remove forms and senses when delete lexemes

https://gerrit.wikimedia.org/r/1104714

Change #1104714 merged by jenkins-bot:

[wikidata/query/rdf@master] Properly remove forms and senses when deleting lexemes

https://gerrit.wikimedia.org/r/1104714

Change #1112251 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/deploy@master] deploy version 0.3.152

https://gerrit.wikimedia.org/r/1112251