Page MenuHomePhabricator

🟦️ Add alerting for failed logical backups
Closed, ResolvedPublic

Description

As developers we want to get an alert for the logical backup cronjob failing.

possible solutions for alert:

  • log metric of cronjob logs
  • container/restart_count

ACs

  • alert is triggered if backup cronjob didn't succeed
  • alert is send to the existing notification group

relevant links:

Event Timeline

toan renamed this task from Add alerting for missing logical backups to Add alerting for failed logical backups.Apr 20 2022, 1:44 PM
toan subscribed.

I suggest we wait about 2 days before deploying this to production ( Just to monitor it on staging. ) because there was an alert fired about 30 minutes after this was deployed to staging. I did some small investigation and I am not sure if the query started counting from the last backup before the alert was deployed.

WMDE-leszek renamed this task from Add alerting for failed logical backups to 🟦️ Add alerting for failed logical backups.May 17 2022, 7:18 PM
Tarrow claimed this task.