We had recently an issue where dbctl ended up confused because a wrong/outadated section (s10). The root cause was discovered to be a non-mw host pooled accidentally under a section that shouldn't exist- instead of returning an error.
It would be great to review dbctl data and make sure there are no older hosts, sections or references to other data that are outdated and could confuse monitoring, pooling/depooling tools, and humans. In particular:
- Checking if there are leftover sections, such as s10 that shouldn't be on mw config/dbctl
- Checking if there are references to databases (hostsByName) that are not intended for mediawiki, such as db2135. It is likely there are still decomissioned hosts or hosts that used to be on mw config but now are misc or other roles.
- Checking there are no depooled hosts that are idle and forgotten
In the future, it would be also nice to have automatic monitoring, such as T256845 (out of scope of this ticket)