On Saturday 12th db2173 (sanitarium master) crashed:
19:42:35 <+icinga-wm> PROBLEM - Host db2173 is DOWN: PING CRITICAL - Packet loss = 100% 19:45:31 <+icinga-wm> PROBLEM - MariaDB Replica IO: s1 on db2094 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2003, Errmsg: error reconnecting to master repl@db2173.codfw.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: Cant connect to MySQL server on db2173.codfw.wmnet (110 Connection timed out) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
Apparently this didn't page anyone.
@fgiunchedi can you help us understand why we didn't get a page? Notifications are definitely enabled