The schema that SUP uses to communicate between the producer and consumer is still marked in development and referenced in the schema config as development/cirrussearch/update_pipeline/update. Is it stable enough now that we can move it into the schema repository and make it work like the other events? This would, for example, have ensured the new private variants of the streams get auto-created by canary events instead of doing it manually.
Description
Details
| Title | Reference | Author | Source Branch | Dest Branch | |
|---|---|---|---|---|---|
| Allow reading from "legacy" update streams | repos/search-platform/cirrus-streaming-updater!170 | dcausse | T375821-allow-reading-legacy-update-streams | main | |
| Produce errors to the fetch_error.v1 stream | repos/search-platform/cirrus-streaming-updater!169 | dcausse | T375821-produce-to-fetch-error-v1 | main | |
| Add --page-weighted-tags-change-legacy-stream | repos/search-platform/cirrus-streaming-updater!168 | dcausse | T375821-add-page-weighted-tags-change-legacy-stream | main | |
| cirrussearch: promote SUP schemas to stable | repos/data-engineering/schemas-event-primary!9 | dcausse | cirrussearch-promote-sup-schemas-to-stable | master |
Event Timeline
So two things have to happen:
- Rename the schema:
- copy from development to mediawiki to make sure old events can still be validated (development copy can only be removed after retention period of kafka is over)
- adapt schema_title in ext-EventStreamConfig.php accordingly
- Rename stream: This may require two phases
- Phase A:
- Add the new topics (without rc0 suffix) to the stream config cirrussearch.update_pipeline.update.rc0 in ext-EventStreamConfig.php
- Configure SUP producer to write to the new topic
- Phase B: (once no more messages are written to old topic and kafka retention period is over)
- Strip rc0 suffix from stream name and remove topics with rc0 suffix in ext-EventStreamConfig.php
- Configure SUP producer and consumer to use stream without rc0 suffix
- Phase A:
I'm not sure we can safely change the schema_title of an existing stream so we might have to create separate streams and adapt the pipeline to consume from multiple update streams.
dcausse opened https://gitlab.wikimedia.org/repos/data-engineering/schemas-event-primary/-/merge_requests/9
cirrussearch: promote SUP schemas to stable
dcausse merged https://gitlab.wikimedia.org/repos/data-engineering/schemas-event-primary/-/merge_requests/9
cirrussearch: promote SUP schemas to stable
Change #1114955 had a related patch set uploaded (by DCausse; author: DCausse):
[operations/mediawiki-config@master] cirrus: add v1 stream for the search update pipeline
Change #1114956 had a related patch set uploaded (by DCausse; author: DCausse):
[operations/mediawiki-config@master] cirrus: drop rc0 streams
dcausse opened https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/168
Add --page-weighted-tags-change-legacy-stream
dcausse merged https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/168
Add --page-weighted-tags-change-legacy-stream
dcausse opened https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/169
Produce errors to the fetch_error.v1 stream
dcausse merged https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/169
Produce errors to the fetch_error.v1 stream
dcausse opened https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/170
Allow reading from "legacy" update streams
Change #1114955 merged by jenkins-bot:
[operations/mediawiki-config@master] cirrus: add v1 stream for the search update pipeline
Mentioned in SAL (#wikimedia-operations) [2025-03-04T08:13:29Z] <dcausse@deploy2002> Started scap sync-world: Backport for [[gerrit:1114955|cirrus: add v1 stream for the search update pipeline (T375821)]]
Mentioned in SAL (#wikimedia-operations) [2025-03-04T08:29:16Z] <dcausse@deploy2002> dcausse: Backport for [[gerrit:1114955|cirrus: add v1 stream for the search update pipeline (T375821)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
Mentioned in SAL (#wikimedia-operations) [2025-03-04T08:54:47Z] <dcausse@deploy2002> Finished scap sync-world: Backport for [[gerrit:1114955|cirrus: add v1 stream for the search update pipeline (T375821)]] (duration: 41m 17s)
Mentioned in SAL (#wikimedia-operations) [2025-03-04T08:55:34Z] <dcausse> restarting eventgate-main to pickup to new streams (T375821)
dcausse merged https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/170
Allow reading from "legacy" update streams
Change #1124483 had a related patch set uploaded (by DCausse; author: DCausse):
[operations/deployment-charts@master] cirrus-streaming-updater: add explicit consumer/producer streams
Change #1124484 had a related patch set uploaded (by DCausse; author: DCausse):
[operations/deployment-charts@master] cirrus-streaming-updater: consume from new v1 & legacy rc0 streams
Change #1124485 had a related patch set uploaded (by DCausse; author: DCausse):
[operations/deployment-charts@master] cirrus-streaming-updater: produce to v1 update streams
Change #1124486 had a related patch set uploaded (by DCausse; author: DCausse):
[operations/deployment-charts@master] cirrus-streaming-updater: stop consuming from legacy streams
Change #1124483 merged by jenkins-bot:
[operations/deployment-charts@master] cirrus-streaming-updater: add explicit consumer/producer streams
Change #1124742 had a related patch set uploaded (by DCausse; author: DCausse):
[mediawiki/extensions/CirrusSearch@master] Produce weighted_tags to mediawiki.cirrussearch.page_weighted_tags_change.v1
Change #1124484 merged by jenkins-bot:
[operations/deployment-charts@master] cirrus-streaming-updater: consume from new v1 & legacy rc0 streams
Change #1129889 had a related patch set uploaded (by DCausse; author: DCausse):
[operations/alerts@master] cirrus: update alerts based on rc0 topics
Change #1124485 merged by jenkins-bot:
[operations/deployment-charts@master] cirrus-streaming-updater: produce to v1 update streams
Change #1130529 had a related patch set uploaded (by DCausse; author: DCausse):
[machinelearning/liftwing/inference-services@main] search weighted_tags: allow producing to the "v1" stream
Change #1130530 had a related patch set uploaded (by DCausse; author: DCausse):
[machinelearning/liftwing/inference-services@main] search weighted_tags: drop BC for rc0 weighted_tag stream
Change #1129889 merged by jenkins-bot:
[operations/alerts@master] cirrus: update alerts based on rc0 topics
Change #1130529 merged by jenkins-bot:
[machinelearning/liftwing/inference-services@main] search weighted_tags: allow producing to the "v1" stream
Change #1124742 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Produce weighted_tags to mediawiki.cirrussearch.page_weighted_tags_change.v1
All rc0 streams are now empty, we should be able to stop consuming from these legacy streams.
Change #1124486 merged by jenkins-bot:
[operations/deployment-charts@master] cirrus-streaming-updater: stop consuming from legacy streams
Change #1130530 merged by jenkins-bot:
[machinelearning/liftwing/inference-services@main] search weighted_tags: drop BC for rc0 weighted_tag stream
Change #1132804 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):
[operations/deployment-charts@master] ml-services: update article-country image
Change #1132804 merged by jenkins-bot:
[operations/deployment-charts@master] ml-services: update article-country image
Change #1180506 had a related patch set uploaded (by DCausse; author: DCausse):
[operations/deployment-charts@master] ml-services: stop using weighted_tags.rc0 stream
Change #1180506 merged by jenkins-bot:
[operations/deployment-charts@master] ml-services: stop using weighted_tags.rc0 stream
Change #1114956 merged by jenkins-bot:
[operations/mediawiki-config@master] cirrus: drop rc0 streams