- Fix check messages so they don't get so mangled (in netbox split patch this is fixed).
- Add information to the check_url (it's in Wikitech, but we could also link the report results).
- Make a dcops contact group because most of these are dcops actionability.
Related Gerrit Patches:
|operations/software/netbox-deploy : master||ganeti-sync: Add retries to api calls|
|Resolved||crusnov||T221113 Netbox Reports: Create an icinga check for alerting on a set of Netbox reports|
|Resolved||crusnov||T224946 Netbox Alert Cleanups|
I am not sure this is related, but we get many alerts of
- PROBLEM - Check the Netbox report puppetdb for fail status. on netbox1001 is CRITICAL: puppetdb.PuppetDB CRITICAL
- PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed
If those are actionable from dc-ops or if we are getting more false positives than we should, we must fix it.