Page MenuHomePhabricator

WDQS updater missed some updates
Open, Needs TriagePublic

Description

Reported at https://www.wikidata.org/wiki/Wikidata:Report_a_technical_problem/WDQS_and_Search#Stale_values_in_SparQL_query_result

  • Q968274 revision 2131311442 at 2024-04-17T13:18:54‎
  • Q4314307 revision 2130626175 at 2024-04-16T13:20:18
  • Q4349600 revision 2130628297 at 2024-04-16T13:23:52
  • Q51670636 revision 2131311281 at 2024-04-17T13:18:30

None of these are found in the event.mediawiki_revision_create hive table.
I can't find them in the eqiad.mediawiki.revision-create topic either

I can't find traces of the other three but searching for "Unable to deliver all events: 503: Service Unavailable" in logstash I can huge spikes of failures (sometimes more than 20k in one hour):

image.png (499×2 px, 95 KB)

It is possible that mediawiki or event-gate failed to properly submits these revision-create events.
Related tasks

  • T249745: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable"
  • T120242: Eventually-Consistent MediaWiki state change events | MediaWiki events as source of truth

Event Timeline

Gehel subscribed.

This seems in the scope of Data Platform Engineering.

Another instance of this issue was reported on wiki:

@dcausse (WMF): fwiw, I have 6 items updated on the 19 & 20 June - https://w.wiki/ASz6 - for which WDQS has not been updated ... on the production WDQS, not test. Only one of them was edited within the June 19 between 03:00 and 15:30 UTC window, afaics. It's not a prolem for me, more of a FYI. --Tagishsimon (talk) 16:01, 21 June 2024 (UTC)

I could only identify one out of the 6 items mentioned in this message:

  • Q17641641 at revision 2183202302 on 2024-06-18T21:17:06

The hive table does not have this data:

 hive (event)> select * from mediawiki_revision_create where year = 2024 and month = 6 and day = 18 and rev_id = 2183202302;
OK
comment	database	meta	page_id	page_is_redirect	page_namespace	page_title	parsedcomment	performer	rev_content_changed	rev_content_format	rev_content_model	rev_id	rev_len	rev_minor_edit	rev_parent_id	rev_sha1	rev_timestamp	chronology_id	_schema	rev_is_revert	rev_revert_details	is_wmf_domain	normalized_host	rev_slots	dt	datacenter	year	month	day	hour
Time taken: 0.303 seconds