several conf* host run out of disk space due to too many actions taken from snapshot1013. This is believed to be caused by a bug on the mediawiki dumping scripts querying and reloading the state of the database configuration too often (once per row?): https://gerrit.wikimedia.org/r/c/mediawiki/core/+/798678/13/includes/export/WikiExporter.php
This caused lvs hosts to complain about not being able to contact etcd:
[18:43] <icinga-wm> PROBLEM - PyBal connections to etcd on lvs1018 is CRITICAL: CRITICAL: 2 connections established with conf1007.eqiad.wmnet:4001 (min=34) https://wikitech.wikimedia.org/wiki/PyBal [18:43] <icinga-wm> PROBLEM - PyBal connections to etcd on lvs1020 is CRITICAL: CRITICAL: 37 connections established with conf1007.eqiad.wmnet:4001 (min=119) https://wikitech.wikimedia.org/wiki/PyBal
Pending things:
- Patch logic so etcd config reloads do not happen so aggressively
- Restart dump process
- Fix pending dbctl commits
- Restart db maintenance
- Something else?