During the codfw rollout we found the issue describe in T133057
Checking m1 replication alerts I found that m1 shard has no-replication checks on it's slaves db1016 and db2010 for broken replica and lag.
During the codfw rollout we found the issue describe in T133057
Checking m1 replication alerts I found that m1 shard has no-replication checks on it's slaves db1016 and db2010 for broken replica and lag.
Technically, it has availability/failover checks from dbproxy1001. We can implement those now that we have pt-heartbeat and non-paging replication alerts. There is lower expectations of perfect availability on misc than core.
I am going to decline this- something has to be fixed, but most likely T172492 will happen first, or those hosts will be decommissioned first.