This task tracks the porting of "base" (i.e. common to all hosts) checks from Icinga to Alertmanager.
There's two basic strategies:
- The check's logic is simple, we can drop a Prometheus node-exporter metric file onto the file system and run the check periodically
- The check's logic is not that simple, in this case we can consider things like https://github.com/canonical/nrpe_exporter
Theses are the current checks in profile::monitoring
disk_space:
- promethues check - https://gerrit.wikimedia.org/r/c/operations/alerts/+/902457
- removed icinga check
dpkg
- promethues check
- removed icinga check
puppet_checkpuppetrun
- promethues check - https://gerrit.wikimedia.org/r/c/operations/alerts/+/902764
- removed icinga check
check_eth - T333007
- promethues check
- removed icinga check
check_systemd_state
- promethues check - https://gerrit.wikimedia.org/r/c/operations/alerts/+/902701
- removed icinga check
check_cpufreq - T163220#8725482
- promethues check
- removed icinga check
edac - T302639
- promethues check
- removed icinga check
ipmi::monitor
- promethues check - https://gerrit.wikimedia.org/r/c/operations/alerts/+/902754
- removed icinga check
check_dhclient -
-
promethues checknot needed https://gerrit.wikimedia.org/r/c/operations/puppet/+/902763 - removed icinga check