trying to gather the concerns I hear over Icinga disabled notifications.
- They clutter the active alert page
- For production hosts, they are being forgotten, causing legitimate alerts to go un-noticed (see T221282 and T149643)
- For hosts with notifications disabled permanently, they use Icinga's already limited resources
- Has been solved with with https://gerrit.wikimedia.org/r/c/operations/puppet/+/594441
- Which seems to be the most common issue can be solved with:
- Periodically auditing the hosts and services that have disabled notifications, maybe during the alert review?
- Policy to only disable notifications via Puppet and not via the Icinga UI
- Policy to not disable notification on new server install, but instead follow Icinga#Avoid_Icinga_spam_on_new_server_installs
- Which seems to be the most polemic issue, can be solved with:
- Dedicated monitoring (other than prod Icinga)
- Bigger server?
But would require more investigation on the exact usecase, scale and impacts. Maybe solving #2 would be enough.