Page MenuHomePhabricator

Fix m1 replication icinga checks
Closed, DeclinedPublic


During the codfw rollout we found the issue describe in T133057

Checking m1 replication alerts I found that m1 shard has no-replication checks on it's slaves db1016 and db2010 for broken replica and lag.

Event Timeline

Technically, it has availability/failover checks from dbproxy1001. We can implement those now that we have pt-heartbeat and non-paging replication alerts. There is lower expectations of perfect availability on misc than core.

I am going to decline this- something has to be fixed, but most likely T172492 will happen first, or those hosts will be decommissioned first.