Page MenuHomePhabricator

"db1047/eventlogging_sync processes" icinga alert is flaky since at least early January
Closed, ResolvedPublic

Description

The "db1047/eventlogging_sync processes" icinga alert keeps getting critical (and pages me) since at least early January. This should be investigated and either resolved or the Icinga check needs to be adopted.

Event Timeline

hoo raised the priority of this task from to Needs Triage.
hoo updated the task description. (Show Details)
hoo added projects: Icinga, SRE, DBA.
hoo added subscribers: hoo, jcrespo.

This check should be marked as non-critical, and not sending pages (but alters to chat/web interface/email). For some reason, either this was changed or some configuration is paging all dba checks, instead of only the critical ones.

hoo added a subscriber: Dzahn.

Change 285640 had a related patch set uploaded (by Filippo Giunchedi):
mariadb: allow up to two eventlogging_sync processes

https://gerrit.wikimedia.org/r/285640

Change 285640 merged by Filippo Giunchedi:
mariadb: allow up to two eventlogging_sync processes

https://gerrit.wikimedia.org/r/285640

I think this will not completely fix the issue, as it seems the script may still fail due to some real issue with data corruption charset, but once this is deployed, we could merge it into https://phabricator.wikimedia.org/T133588#2237441 to fix the pending db-specific issue.

jcrespo claimed this task.

Not happening for a long time now. Also T124307.