In T122676#3411664, we see that HA redundancy using twemproxy won't support the MULTI command. Find why we're using transactions, fix it or otherwise work around this incompatibility. I can't think of any reason that we wouldn't want at-least-once task consumption, if this turns out to be related to Celery task management.
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | None | T181538 ORES overload incident, 2017-11-28 | |||
Resolved | Ladsgroup | T181632 Celery manager implodes horribly if Redis goes down | |||
Resolved | Ladsgroup | T181559 Investigate redis-cluster or other techniques for making Redis not a single point of failure. | |||
Declined | None | T122676 Implement sentinel for ORES production Redis | |||
Resolved | Ladsgroup | T196889 Investigate what is creating Redis transactions and whether it can be fixed |
Event Timeline
Comment Actions
This recent discussion makes it look like Celery is responsible for the transaction, and that it's a side-effect of using pipelines: https://github.com/celery/celery/issues/3500
Comment Actions
Running redis-cli monitor on deployment-ores01 gives these kind of transactions:
1541697362.325121 [0 10.68.16.235:35308] "BRPOP" "celery" "celery\x06\x163" "celery\x06\x166" "celery\x06\x169" "1" 1541697362.326760 [0 10.68.16.235:35310] "MULTI" 1541697362.326771 [0 10.68.16.235:35310] "ZREM" "unacked_index" "cba164cf-1a37-4f1e-8d5d-2019788166be" 1541697362.326782 [0 10.68.16.235:35310] "HDEL" "unacked" "cba164cf-1a37-4f1e-8d5d-2019788166be" 1541697362.326791 [0 10.68.16.235:35310] "EXEC" 1541697362.327471 [0 10.68.16.235:35670] "GET" "celery-task-meta-20299d80-5b2b-4d49-a38f-42b2620fef1e" 1541697362.341398 [0 10.68.16.235:35332] "MULTI" 1541697362.341424 [0 10.68.16.235:35332] "SETEX" "celery-task-meta-20299d80-5b2b-4d49-a38f-42b2620fef1e" "86400" "\x80\x02}q\x00(X\x06\x00\x00\x00resultq\x01}q\x02X\t\x00\x00\x00goodfaithq\x03}q\x04X\x05\x00\x00\x00scoreq\x05}q\x06(X\n\x00\x00\x00predictionq\a\x88X\x0b\x00\x00\x00probabilityq\b}q\t(\x89G=\xe2\xde\xac\x00\x00\x00\x00\x88G?\xef\xff\xff\xff\xed!TuussX\t\x00\x00\x00tracebackq\nNX\b\x00\x00\x00childrenq\x0b]q\x0cX\x06\x00\x00\x00statusq\rX\a\x00\x00\x00SUCCESSq\x0eX\a\x00\x00\x00task_idq\x0fX$\x00\x00\x0020299d80-5b2b-4d49-a38f-42b2620fef1eq\x10u." 1541697362.341471 [0 10.68.16.235:35332] "PUBLISH" "celery-task-meta-20299d80-5b2b-4d49-a38f-42b2620fef1e" "\x80\x02}q\x00(X\x06\x00\x00\x00resultq\x01}q\x02X\t\x00\x00\x00goodfaithq\x03}q\x04X\x05\x00\x00\x00scoreq\x05}q\x06(X\n\x00\x00\x00predictionq\a\x88X\x0b\x00\x00\x00probabilityq\b}q\t(\x89G=\xe2\xde\xac\x00\x00\x00\x00\x88G?\xef\xff\xff\xff\xed!TuussX\t\x00\x00\x00tracebackq\nNX\b\x00\x00\x00childrenq\x0b]q\x0cX\x06\x00\x00\x00statusq\rX\a\x00\x00\x00SUCCESSq\x0eX\a\x00\x00\x00task_idq\x0fX$\x00\x00\x0020299d80-5b2b-4d49-a38f-42b2620fef1eq\x10u." 1541697362.341509 [0 10.68.16.235:35332] "EXEC"
These are vital parts of celery and I doubt they would be easily fixable. The issue on github basically implies the same thing.