The WDQS streaming updater should have a way to disable or tag side output events
Closed, ResolvedPublic5 Estimated Story Points
Actions

Assigned To

Authored By

	dcausse
	Sep 27 2023, 5:25 PM

Description

The WDQS streaming updater captures the problems it encounter into 3 different streams.
These streams are then processed by a reconciliation mechanism running hourly and can emit "reconcile" events to attempt to fix the problem.
The issue is that if the problems are populated from different jobs (the production job and a test job) reconcile events might be processed by the production job while the original problems were actually encountered in the test job.
This problem was uncovered after a failure of mirrormaker between kafka-main@eqiad and kafka-jumbo:

the test pipeline running in dse-k8s saw multiple thousands of late events while mirrormaker was going up and down
these late events were pushed as errors to eqiad.rdf-streaming-updater.lapsed-action in kafka-jumbo
the reconciliation mechanism processed these events and shipped new reconcile events to eventgate-main with the tag wdqs_source_tag_prefix@eqiad
the production job running in wikikube@eqiad processed the events tagged as wdqs_source_tag_prefix@eqiad
the consumers running close to blazegraph had thousands of reconcile to process (very slow) and caused the wdqs machines to lag behind

When reporting a problem the updater should tag more precisely these events such it so that reconcile events can properly be tagged appropriately and not rely on the datacenter to determine the provenance of the original error.

AC:

the updater job has new --pipeline argument that accepts a string
the side output events have a new field named 'pipeline' or 'emitter_id'
the reconcile process can map this new field to a corresponding reconciliation_source
the updater job has a way to stop emitting side outputs (for testing purposes)

Details

Subject	Repo	Branch	Lines +/-
Add support for emitter_id in error streams	wikidata/query/rdf	master	+138 -301
rdf_streaming_updater: add emitter_id to side outputs	schemas/event/secondary	master	+716 -16
rdf-streaming-updater: bump image version to flink-1.16.1-rdf-0.3.133	operations/deployment-charts	master	+1 -1
rdf-streaming-updater: set allowNonRestoredState	operations/deployment-charts	master	+1 -1
rdf-streaming-updater: do not produce side outputs to kafka	operations/deployment-charts	master	+4 -4
Add an option to disable producing errors to kafka	wikidata/query/rdf	master	+48 -52

Customize query in gerrit

Event Timeline

dcausse created this task.Sep 27 2023, 5:25 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 27 2023, 5:25 PM

Maintenance_bot added a project: Wikidata.Sep 27 2023, 5:29 PM

Change 961858 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/rdf@master] Add an option to disable producing errors to kafka

https://gerrit.wikimedia.org/r/961858

gerritbot added a project: Patch-For-Review.Sep 28 2023, 5:25 PM

Change 961858 merged by jenkins-bot:

[wikidata/query/rdf@master] Add an option to disable producing errors to kafka

https://gerrit.wikimedia.org/r/961858

Maintenance_bot removed a project: Patch-For-Review.Sep 28 2023, 7:30 PM

bking subscribed.Sep 28 2023, 7:43 PM

Change 961985 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] rdf-streaming-updater: do not produce side outputs to kafka

https://gerrit.wikimedia.org/r/961985

gerritbot added a project: Patch-For-Review.Sep 29 2023, 8:26 AM

Change 961985 merged by Bking:

[operations/deployment-charts@master] rdf-streaming-updater: do not produce side outputs to kafka

https://gerrit.wikimedia.org/r/961985

Maintenance_bot removed a project: Patch-For-Review.Sep 29 2023, 2:10 PM

Change 962051 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] rdf-streaming-updater: set allowNonRestoredState

https://gerrit.wikimedia.org/r/962051

gerritbot added a project: Patch-For-Review.Sep 29 2023, 3:16 PM

Change 962051 merged by jenkins-bot:

[operations/deployment-charts@master] rdf-streaming-updater: set allowNonRestoredState

https://gerrit.wikimedia.org/r/962051

Maintenance_bot removed a project: Patch-For-Review.Sep 29 2023, 3:30 PM

dcausse added a project: Sustainability (Incident Followup).Oct 2 2023, 8:45 AM

Restricted Application added a project: [DEPRECATED] wdwb-tech. · View Herald TranscriptOct 2 2023, 8:45 AM

dcausse updated the task description. (Show Details)Oct 2 2023, 8:46 AM

Gehel moved this task from Incoming to Current work on the Wikidata-Query-Service board.Oct 2 2023, 3:47 PM

Gehel added a project: Discovery-Search (Current work).

Change 962694 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] rdf-streaming-updater: bump image version to flink-1.16.1-rdf-0.3.133

https://gerrit.wikimedia.org/r/962694

gerritbot added a project: Patch-For-Review.Oct 2 2023, 8:27 PM

Change 963006 had a related patch set uploaded (by DCausse; author: DCausse):

[schemas/event/secondary@master] rdf_streaming_updater: add emitter_id to side outputs

https://gerrit.wikimedia.org/r/963006

dcausse claimed this task.Oct 3 2023, 9:00 AM

dcausse set the point value for this task to 5.

dcausse moved this task from Incoming to In Progress on the Discovery-Search (Current work) board.

Change 962694 merged by jenkins-bot:

[operations/deployment-charts@master] rdf-streaming-updater: bump image version to flink-1.16.1-rdf-0.3.133

https://gerrit.wikimedia.org/r/962694

Change 964050 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/rdf@master] [WIP] Add support for emitter_id in error streams

https://gerrit.wikimedia.org/r/964050

dcausse moved this task from In Progress to Needs review on the Discovery-Search (Current work) board.Oct 9 2023, 3:06 PM

Change 963006 merged by jenkins-bot:

[schemas/event/secondary@master] rdf_streaming_updater: add emitter_id to side outputs

https://gerrit.wikimedia.org/r/963006

Change 964050 merged by jenkins-bot:

[wikidata/query/rdf@master] Add support for emitter_id in error streams

https://gerrit.wikimedia.org/r/964050

Maintenance_bot removed a project: Patch-For-Review.Nov 16 2023, 6:30 PM

dcausse moved this task from Needs review to To Be Deployed on the Discovery-Search (Current work) board.Nov 20 2023, 4:11 PM

dcausse moved this task from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.Dec 4 2023, 4:11 PM

Gehel closed this task as Resolved.Dec 8 2023, 9:10 AM

The WDQS streaming updater should have a way to disable or tag side output eventsClosed, ResolvedPublic5 Estimated Story PointsActions

Description

Details

Event Timeline

The WDQS streaming updater should have a way to disable or tag side output events
Closed, ResolvedPublic5 Estimated Story Points
Actions