As a maintainer of the CirrusSearch data pipeline I want to understand why the mjolnir-bulk-update daemon is performing a lot more updates when running from eqiad.
Graphs show that eqiad is processing a lot more documents than codfw:
Looking at the graphs it seems that codfw is not consuming the prioritized topic, possibly because the metrics it inspects are not updated/checked frequently enough to reflect the actual state of the lag and properly pause other topics. Eqiad seems to have captured the need to consume the prioritized topic quite late as it had around 172k priority messages to process, suggesting that the metric is not reactive/accurate enough to help the prioritized topics to be consumed quicker.
Note that restarting the daemon does not seem to help.
AC:
- prioritized topic are consumer quicker than the non prioritized ones