Page MenuHomePhabricator

Move paging from individual databases to database service "groups"
Open, MediumPublic


Followup to T177782.

T177782 was closed because it was too abstract and mostly fixed, except for service-level paging. Right now, if a database server goes down, it paging, as long as it is in the "core" group. This was needed because a) almost all databases were a SPOF b) there was no easy way to understand the groups/pooling state of servers.

With dbctl, we can now query dynamically the different mysql groups and services.

Paging should move from "individual db host down" to "db service down or degraded". This is not trivial as it implies distributed checking- so it will likely need extra infrastructure. However, the idea is to page because "recentchanges databases for enwiki" are down, not "db1126 is down".

It may require mediawiki-level monitoring?