Currently, the functions in wmcs_libs/alerts.py ssh to alert1001.wikimedia.org to downtime alerts. This works only if you run the cookbook from your laptop and you have global root privileges.
We have several cookbooks that we want to run from cloudcumins (e.g. wmcs.ceph.roll_reboot_osds or wmcs.openstack.roll_reboot_cloudgws) that require downtiming some alerts on wmcs-managed physical hosts. At the moment from cloudcumins you can ssh into those hosts (with the cloud_cumin_master key) but you cannot silence alerts related to those hosts.
Some thoughts:
- we can probably ignore Icinga alerts, as we want to move away from them anyway
- we need a way to silence Prometheus alerts only for wmcs-managed hosts
- is there a way to give limited access to the Prometheus API/CLI or do we need a separate Prometheus instance?