It would be useful to have a stream that supports the addition and deletion of CirrusSearch weighted_tags.
The stream would allow users willing to tag/un-tag pages in the search index to simply emit events to this stream.
There might be 2 different use-cases to support:
- realtime processes bound to the lifecycle of the page
- batch processes possibly sending a large number of modification
We might consider exposing 2 different streams giving us the opportunity to route or throttle the events accordindly:
- events bound to the lifecycle of the page might enter the merge window of the SUP producer so that they get a chance to be joined with other events related to the same edit
- events produced in batch might skip that window and possibly be throttled (if deemed necessary) to limit the impact on latencies of the realtime events.
For now, we start with a single steam.
AC:
- define a schema for this stream
- define a stream config
- create kafka topics (1 partition, 7 days retention):
- eqiad.mediawiki.cirrussearch.page_weighted_tags_change.rc0
- codfw.mediawiki.cirrussearch.page_weighted_tags_change.rc0
- adapt the SUP producer to read these streams
- possibly consider using watermark alignment and see if this helps the case where the batch stream might produce a lot of events at once
- adapt the https://wikitech.wikimedia.org/wiki/Search/WeightedTags documentation on wikitech
- adapt existing users of weitghed_tags to use this stream: