Page MenuHomePhabricator

Several backup alerts fired
Closed, ResolvedPublic

Description

I have ack'ed all the alerts for now

Captura de pantalla 2023-11-20 a las 7.39.30.png (844×3 px, 554 KB)

Event Timeline

I wonder if those alert could be related to this T351588: Puppet failing ln dbprov2004 which has been failing for the whole weekend?

Change 976158 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] dbbackups: Update config for mysql backup monitoring

https://gerrit.wikimedia.org/r/976158

Change 976158 merged by Jcrespo:

[operations/puppet@production] dbbackups: Update config for mysql backup monitoring

https://gerrit.wikimedia.org/r/976158

jcrespo changed the task status from Open to In Progress.Tue, Nov 21, 11:16 AM
jcrespo triaged this task as High priority.

I believe to have fixed the issue, puppet was wrong after the package fix at T351491.

I am rerunning all snapshots to check it is now fixed. Logical backups will be reanalyzed to not lose their metadata. In theory backups continued running, just we lost temporarily its status and metadata.

So we should now wait and by the end of the day all alerts should go back to green.

Metadata from yesterday's backups are starting to pour in:

Screenshot_20231121_125655.png (419×2 px, 57 KB)

All work is now done here and only pending the last snapshots to finish to resolve this issue.

Screenshot_20231121_193022.png (1×2 px, 229 KB)