Page MenuHomePhabricator

Five deleted Wikidata items pertaining to Wikimedia category pages still present in the Query Service
Closed, ResolvedPublic8 Estimated Story PointsBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:

682 RDF triples where the subject is one of the five Qids Q10813441, Q32994683, Q55929561, Q109548562, and Q111436860 are returned.

What should have happened instead?:

No triples should be returned (Q32994683 was deleted on 3 May, Q109548562 on 6 June, and the other items named above were deleted in-between).

Event Timeline

Gehel set the point value for this task to 8.Aug 7 2023, 3:31 PM
itemdeletion date
Q108134412023-06-06T14:21:49
Q329946832023-05-03T16:04:27
Q559295612023-05-31T20:24:58
Q1095485622023-06-06T14:22:29
Q1114368602023-05-04T06:54:56

None of these deletions appear to have been treated by the WDQS updater.
Looking further I can't find any traces of the corresponding delete events searching in event.mediawiki_page_delete and event.mediawiki_page_suppress:

select * from mediawiki_page_delete where year=2023 and month in (5,6,7,8) and page_title in ('Q55929561', 'Q10813441', 'Q32994683', 'Q55929561', 'Q109548562', 'Q111436860');
OK
comment	database	meta	page_id	page_is_redirect	page_namespace	page_title	parsedcomment	performer	rev_count	rev_id	chronology_id	_schema	is_wmf_domain	normalized_host	datacenter	year	month	day	hour
Time taken: 1.557 seconds
select * from mediawiki_page_suppress where year=2023 and month in (5,6,7,8) and page_title in ('Q55929561', 'Q10813441', 'Q32994683', 'Q55929561', 'Q109548562', 'Q111436860');
OK
_schema	meta	database	performer	page_id	page_title	page_namespace	page_is_redirect	rev_id	chronology_id	comment	parsedcomment	rev_count	is_wmf_domain	normalized_host	datacenter	year	month	day	hour
Time taken: 0.311 seconds

This possibly means that mediawiki never told the WDQS updater that these items were deleted.

Searching for these items in the event.mediawiki_page_change_v1 did not yield any results, it seems to me that the root cause is a problem somewhere between MW and kafka-main (EventBus and/or eventgate?).

Just a random drive-by note, since I'm not the one playing with this, but it might be interesting to instrument EventBus a little bit. For example, from the deferred job that publishes to Kafka, we could log a basic key for each event that we publish. It should be possible to aggregate these logs and compare them against what we see in Kafka to figure out what we missed, perhaps even facilitate retries.

Going to work on improving the tooling regarding reconciliations of missed deletes but I won't be working on the root cause. I agree with @Milimetric here and we need to get a better sense of the quality of the EventBus/EventGate system, 5 (*identified*) missed events on a stream that is relatively low volume over 1 month seems concerning.

Change 955783 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/rdf@master] Allow reconciling a deletion without knowing its revision

https://gerrit.wikimedia.org/r/955783

Change 955783 merged by jenkins-bot:

[wikidata/query/rdf@master] Allow reconciling a deletion without knowing its revision

https://gerrit.wikimedia.org/r/955783

Ottomata subscribed.

I agree with @Milimetric here and we need to get a better sense of the quality of the EventBus/EventGate system

T345195: Identify indicators to inform an SLO for event emission and intake

Reconciled these items manually, improving the reliability of the event system will be tracked in other tasks such as T345195