Page MenuHomePhabricator

The WDQS streaming updater should have a way to disable or tag side output events
Closed, ResolvedPublic5 Estimated Story Points

Description

The WDQS streaming updater captures the problems it encounter into 3 different streams.
These streams are then processed by a reconciliation mechanism running hourly and can emit "reconcile" events to attempt to fix the problem.
The issue is that if the problems are populated from different jobs (the production job and a test job) reconcile events might be processed by the production job while the original problems were actually encountered in the test job.
This problem was uncovered after a failure of mirrormaker between kafka-main@eqiad and kafka-jumbo:

  • the test pipeline running in dse-k8s saw multiple thousands of late events while mirrormaker was going up and down
  • these late events were pushed as errors to eqiad.rdf-streaming-updater.lapsed-action in kafka-jumbo
  • the reconciliation mechanism processed these events and shipped new reconcile events to eventgate-main with the tag wdqs_source_tag_prefix@eqiad
  • the production job running in wikikube@eqiad processed the events tagged as wdqs_source_tag_prefix@eqiad
  • the consumers running close to blazegraph had thousands of reconcile to process (very slow) and caused the wdqs machines to lag behind

When reporting a problem the updater should tag more precisely these events such it so that reconcile events can properly be tagged appropriately and not rely on the datacenter to determine the provenance of the original error.

AC:

  • the updater job has new --pipeline argument that accepts a string
  • the side output events have a new field named 'pipeline' or 'emitter_id'
  • the reconcile process can map this new field to a corresponding reconciliation_source
  • the updater job has a way to stop emitting side outputs (for testing purposes)

Event Timeline

Change 961858 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/rdf@master] Add an option to disable producing errors to kafka

https://gerrit.wikimedia.org/r/961858

Change 961858 merged by jenkins-bot:

[wikidata/query/rdf@master] Add an option to disable producing errors to kafka

https://gerrit.wikimedia.org/r/961858

Change 961985 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] rdf-streaming-updater: do not produce side outputs to kafka

https://gerrit.wikimedia.org/r/961985

Change 961985 merged by Bking:

[operations/deployment-charts@master] rdf-streaming-updater: do not produce side outputs to kafka

https://gerrit.wikimedia.org/r/961985

Change 962051 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] rdf-streaming-updater: set allowNonRestoredState

https://gerrit.wikimedia.org/r/962051

Change 962051 merged by jenkins-bot:

[operations/deployment-charts@master] rdf-streaming-updater: set allowNonRestoredState

https://gerrit.wikimedia.org/r/962051

Change 962694 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] rdf-streaming-updater: bump image version to flink-1.16.1-rdf-0.3.133

https://gerrit.wikimedia.org/r/962694

Change 963006 had a related patch set uploaded (by DCausse; author: DCausse):

[schemas/event/secondary@master] rdf_streaming_updater: add emitter_id to side outputs

https://gerrit.wikimedia.org/r/963006

dcausse set the point value for this task to 5.
dcausse moved this task from Incoming to In Progress on the Discovery-Search (Current work) board.

Change 962694 merged by jenkins-bot:

[operations/deployment-charts@master] rdf-streaming-updater: bump image version to flink-1.16.1-rdf-0.3.133

https://gerrit.wikimedia.org/r/962694

Change 964050 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/rdf@master] [WIP] Add support for emitter_id in error streams

https://gerrit.wikimedia.org/r/964050

Change 963006 merged by jenkins-bot:

[schemas/event/secondary@master] rdf_streaming_updater: add emitter_id to side outputs

https://gerrit.wikimedia.org/r/963006

Change 964050 merged by jenkins-bot:

[wikidata/query/rdf@master] Add support for emitter_id in error streams

https://gerrit.wikimedia.org/r/964050