Page MenuHomePhabricator

Make primary DB masters page on HOST DOWN alert
Open, MediumPublic

Description

At the moment, any host going down won't page, they will just send an IRC alert.

While this might be ok for the rest of the infra, if a primary database master goes down, that means that all the wikis on it will automatically go on read-only (apart from replication getting broken on the slaves).
In some cases, replication broken alerts can take up to 15 minutes to actually send an SMS - we should page for a master going down at it needs immediate action.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 24 2019, 5:54 AM
Marostegui triaged this task as Medium priority.Sep 24 2019, 5:55 AM
Marostegui added a project: Wikimedia-Incident.
Marostegui moved this task from Triage to Backlog on the DBA board.
jijiki added a subscriber: jijiki.Oct 11 2019, 2:22 PM