The stream mediawiki.cirrussearch.page_rerender.v1 is currently enabled only for testwiki.
As we'd like to test how the search update pipeline works in a backfill scenario (T350826) having such stream populated with more wikis is interesting for us.
This task is track what needs to be done to have such stream populated with most of our wikis (public ones).
- Double-check if kafka-main can be used
- rate ~300 evt/s
- expected topic size for 7days: expected topic size ~110Gb, replicated 330Gb, 5 partitions 22Gb each, additional 66Gb per node on a 5nodes cluster
- if size is a concern we could reduce retention to 4days or possibly explore if log compaction is usable/useful in this context (cc @pfischer)
- Should we enable this gradually, in 2, 3, 4 or more steps?
Prerequisites:
- RESOLVED Get the green light from serviceops (cc @elukey)
- Increase the number of partitions to 5 on existing topics on main-eqiad, main-codfw and kafka-jumbo
- codfw.mediawiki.cirrussearch.page_rerender.v1
- eqiad.mediawiki.cirrussearch.page_rerender.v1
AC:
- the mediawiki.cirrussearch.page_rerender.v1 steam is populated for all the the public wikis