Page MenuHomePhabricator

mjolnir-bulk-update should consume prioritized topic quicker than normal ones
Closed, ResolvedPublic5 Estimated Story Points

Description

As a maintainer of the CirrusSearch data pipeline I want to understand why the mjolnir-bulk-update daemon is performing a lot more updates when running from eqiad.

Graphs show that eqiad is processing a lot more documents than codfw:

Capture d’écran du 2021-07-14 10-55-41.png (235×1 px, 49 KB)

Looking at the graphs it seems that codfw is not consuming the prioritized topic, possibly because the metrics it inspects are not updated/checked frequently enough to reflect the actual state of the lag and properly pause other topics. Eqiad seems to have captured the need to consume the prioritized topic quite late as it had around 172k priority messages to process, suggesting that the metric is not reactive/accurate enough to help the prioritized topics to be consumed quicker.

Note that restarting the daemon does not seem to help.

AC:

  • prioritized topic are consumer quicker than the non prioritized ones

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
dcausse renamed this task from Differences in mjolnir-bulk-update behaviors between codfw & eqiad to mjolnir-bulk-update should consume prioritized topic quicker than normal ones .Jul 26 2021, 3:32 PM
dcausse removed the point value for this task.
Gehel set the point value for this task to 5.Jul 26 2021, 3:36 PM

Change 709546 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[search/MjoLniR@master] bulk_daemon: Sync fetch prioritized topic highwater

https://gerrit.wikimedia.org/r/709546

Change 709546 merged by jenkins-bot:

[search/MjoLniR@master] bulk_daemon: Sync fetch prioritized topic highwater

https://gerrit.wikimedia.org/r/709546

Change 709780 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[search/MjoLniR/deploy@master] Deploy fixes to bulk daemon prioritization

https://gerrit.wikimedia.org/r/709780

Change 709780 merged by Ebernhardson:

[search/MjoLniR/deploy@master] Deploy fixes to bulk daemon prioritization

https://gerrit.wikimedia.org/r/709780

Mentioned in SAL (#wikimedia-operations) [2021-08-03T18:05:43Z] <ebernhardson@deploy1002> Started deploy [search/mjolnir/deploy@f0f70d1]: T286642 fixes to bulk daemon prioritization

Mentioned in SAL (#wikimedia-operations) [2021-08-03T18:06:31Z] <ebernhardson@deploy1002> Finished deploy [search/mjolnir/deploy@f0f70d1]: T286642 fixes to bulk daemon prioritization (duration: 00m 48s)

Patch to prioritization deployed, will need to monitor the next weekly data load to verify correct operation.