Page MenuHomePhabricator

Migrate streaming updater event schema to the standard schema repository
Closed, ResolvedPublic3 Estimated Story Points

Description

The schema that SUP uses to communicate between the producer and consumer is still marked in development and referenced in the schema config as development/cirrussearch/update_pipeline/update. Is it stable enough now that we can move it into the schema repository and make it work like the other events? This would, for example, have ensured the new private variants of the streams get auto-created by canary events instead of doing it manually.

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Allow reading from "legacy" update streamsrepos/search-platform/cirrus-streaming-updater!170dcausseT375821-allow-reading-legacy-update-streamsmain
Produce errors to the fetch_error.v1 streamrepos/search-platform/cirrus-streaming-updater!169dcausseT375821-produce-to-fetch-error-v1main
Add --page-weighted-tags-change-legacy-streamrepos/search-platform/cirrus-streaming-updater!168dcausseT375821-add-page-weighted-tags-change-legacy-streammain
cirrussearch: promote SUP schemas to stablerepos/data-engineering/schemas-event-primary!9dcaussecirrussearch-promote-sup-schemas-to-stablemaster
Customize query in GitLab

Event Timeline

dr0ptp4kt triaged this task as Medium priority.Sep 30 2024, 3:08 PM
dr0ptp4kt moved this task from needs triage to Current work on the Discovery-Search board.
Gehel set the point value for this task to 3.Oct 14 2024, 3:48 PM

So two things have to happen:

  1. Rename the schema:
    • copy from development to mediawiki to make sure old events can still be validated (development copy can only be removed after retention period of kafka is over)
    • adapt schema_title in ext-EventStreamConfig.php accordingly
  2. Rename stream: This may require two phases
    1. Phase A:
      • Add the new topics (without rc0 suffix) to the stream config cirrussearch.update_pipeline.update.rc0 in ext-EventStreamConfig.php
      • Configure SUP producer to write to the new topic
    2. Phase B: (once no more messages are written to old topic and kafka retention period is over)
      • Strip rc0 suffix from stream name and remove topics with rc0 suffix in ext-EventStreamConfig.php
      • Configure SUP producer and consumer to use stream without rc0 suffix

I'm not sure we can safely change the schema_title of an existing stream so we might have to create separate streams and adapt the pipeline to consume from multiple update streams.

Change #1114955 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/mediawiki-config@master] cirrus: add v1 stream for the search update pipeline

https://gerrit.wikimedia.org/r/1114955

Change #1114956 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/mediawiki-config@master] cirrus: drop rc0 streams

https://gerrit.wikimedia.org/r/1114956

Change #1114955 merged by jenkins-bot:

[operations/mediawiki-config@master] cirrus: add v1 stream for the search update pipeline

https://gerrit.wikimedia.org/r/1114955

Mentioned in SAL (#wikimedia-operations) [2025-03-04T08:13:29Z] <dcausse@deploy2002> Started scap sync-world: Backport for [[gerrit:1114955|cirrus: add v1 stream for the search update pipeline (T375821)]]

Mentioned in SAL (#wikimedia-operations) [2025-03-04T08:29:16Z] <dcausse@deploy2002> dcausse: Backport for [[gerrit:1114955|cirrus: add v1 stream for the search update pipeline (T375821)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-03-04T08:54:47Z] <dcausse@deploy2002> Finished scap sync-world: Backport for [[gerrit:1114955|cirrus: add v1 stream for the search update pipeline (T375821)]] (duration: 41m 17s)

Mentioned in SAL (#wikimedia-operations) [2025-03-04T08:55:34Z] <dcausse> restarting eventgate-main to pickup to new streams (T375821)

Change #1124483 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] cirrus-streaming-updater: add explicit consumer/producer streams

https://gerrit.wikimedia.org/r/1124483

Change #1124484 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] cirrus-streaming-updater: consume from new v1 & legacy rc0 streams

https://gerrit.wikimedia.org/r/1124484

Change #1124485 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] cirrus-streaming-updater: produce to v1 update streams

https://gerrit.wikimedia.org/r/1124485

Change #1124486 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] cirrus-streaming-updater: stop consuming from legacy streams

https://gerrit.wikimedia.org/r/1124486

Change #1124483 merged by jenkins-bot:

[operations/deployment-charts@master] cirrus-streaming-updater: add explicit consumer/producer streams

https://gerrit.wikimedia.org/r/1124483

Change #1124742 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Produce weighted_tags to mediawiki.cirrussearch.page_weighted_tags_change.v1

https://gerrit.wikimedia.org/r/1124742

Moving to blocked waiting for the two subtasks to be addressed

Change #1124484 merged by jenkins-bot:

[operations/deployment-charts@master] cirrus-streaming-updater: consume from new v1 & legacy rc0 streams

https://gerrit.wikimedia.org/r/1124484

Change #1129889 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/alerts@master] cirrus: update alerts based on rc0 topics

https://gerrit.wikimedia.org/r/1129889

Change #1124485 merged by jenkins-bot:

[operations/deployment-charts@master] cirrus-streaming-updater: produce to v1 update streams

https://gerrit.wikimedia.org/r/1124485

Change #1130529 had a related patch set uploaded (by DCausse; author: DCausse):

[machinelearning/liftwing/inference-services@main] search weighted_tags: allow producing to the "v1" stream

https://gerrit.wikimedia.org/r/1130529

Change #1130530 had a related patch set uploaded (by DCausse; author: DCausse):

[machinelearning/liftwing/inference-services@main] search weighted_tags: drop BC for rc0 weighted_tag stream

https://gerrit.wikimedia.org/r/1130530

Change #1129889 merged by jenkins-bot:

[operations/alerts@master] cirrus: update alerts based on rc0 topics

https://gerrit.wikimedia.org/r/1129889

Change #1130529 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] search weighted_tags: allow producing to the "v1" stream

https://gerrit.wikimedia.org/r/1130529

Change #1124742 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Produce weighted_tags to mediawiki.cirrussearch.page_weighted_tags_change.v1

https://gerrit.wikimedia.org/r/1124742

All rc0 streams are now empty, we should be able to stop consuming from these legacy streams.

Change #1124486 merged by jenkins-bot:

[operations/deployment-charts@master] cirrus-streaming-updater: stop consuming from legacy streams

https://gerrit.wikimedia.org/r/1124486

Change #1130530 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] search weighted_tags: drop BC for rc0 weighted_tag stream

https://gerrit.wikimedia.org/r/1130530

Change #1132804 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: update article-country image

https://gerrit.wikimedia.org/r/1132804

Change #1132804 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update article-country image

https://gerrit.wikimedia.org/r/1132804

Change #1180506 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] ml-services: stop using weighted_tags.rc0 stream

https://gerrit.wikimedia.org/r/1180506

Change #1180506 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: stop using weighted_tags.rc0 stream

https://gerrit.wikimedia.org/r/1180506

Change #1114956 merged by jenkins-bot:

[operations/mediawiki-config@master] cirrus: drop rc0 streams

https://gerrit.wikimedia.org/r/1114956