ensure that services on labtest machines never create SMS from Icinga (not send sms pages for labtest* things to non-cloud folks)
Closed, ResolvedPublic

Description

This is a ticket to achieve the goal:

< chasemp> not send sms pages for labtest* things to non-cloud folks

but i slightly changed it to "never send SMS if host name is like "labtest*", because that would be much easier to achieve

@chasemp @jcrespo

To start with, is the task title a fair summary of the goal here?

If yea, then we would first need to know which roles are used on labtest machines (and will be in the future) and then go through them and check if there are any monitoring classes that enable paging by default and then add some check by hostname that disables it.

Will upload an example here.

Dzahn created this task.Oct 11 2017, 9:53 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 11 2017, 9:53 PM
Dzahn assigned this task to chasemp.Oct 11 2017, 9:54 PM
chasemp triaged this task as Normal priority.Oct 11 2017, 9:55 PM

thanks @Dzahn

Change 383713 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mariadb/icinga: if fqdn like labtest, don't page

https://gerrit.wikimedia.org/r/383713

Change 384183 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] base/icinga: if on labs, don't page for mysql procs

https://gerrit.wikimedia.org/r/384183

The second patch should be a nicer solution, it avoids having a regex in the mariadb module or making any changes there and just uses Hiera.

There are other existing checks though that would page and don't already have "is_critical" parameters like that.

faidon moved this task from Backlog to In progress on the monitoring board.Oct 16 2017, 3:24 PM

Change 383713 abandoned by Dzahn:
mariadb/icinga: if fqdn like labtest, don't page

Reason:
https://gerrit.wikimedia.org/r/#/c/384183/ is doing the same thing but with the better technical solution (just Hiera, not touching mariadb module)

https://gerrit.wikimedia.org/r/383713

Change 384183 merged by Rush:
[operations/puppet@production] base/icinga: if mysql is in labtest never send pages

https://gerrit.wikimedia.org/r/384183

Change 384892 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] openstack2: no Icinga paging (SMS) if on labtest

https://gerrit.wikimedia.org/r/384892

Change 384893 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] toollabs/icinga: no paging if on labtest

https://gerrit.wikimedia.org/r/384893

Change 384895 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mysql/icinga/labtest: no pages if on labtest, pt.2

https://gerrit.wikimedia.org/r/384895

Change 384895 merged by Dzahn:
[operations/puppet@production] mysql/icinga/labtest: no pages if on labtest, pt.2

https://gerrit.wikimedia.org/r/384895

Change 384892 merged by Rush:
[operations/puppet@production] openstack2: no Icinga paging (SMS) if on labtest

https://gerrit.wikimedia.org/r/384892

Change 384893 merged by Dzahn:
[operations/puppet@production] toollabs/icinga: no paging if on labtest

https://gerrit.wikimedia.org/r/384893

Dzahn added a subscriber: faidon.Nov 30 2017, 12:37 AM

@chasemp @faidon @jcrespo

Ok, so what we have meanwhile is this in Hiera, which all applies to any host starting with labtest*.

mariadb::monitor_process::is_critical: false
mariadb::monitor_disk::is_critical: false
openstack::designate::monitor::critical: false
openstack::nova::conductor::monitor::critical: false
openstack::nova::network::monitor::critical: false
icinga::monitor::toollabs::critical: false

So these checks above are all normally "critical" checks in the sense that they send SMS, but if the hostname starts with labtest* this is reversed so there will be no pages.

It will cover the existing checks that we made this for (MySQL PROC, MySQL DISK) originally and what you see above.

It does not automatically cover any Icinga check that might be applied in the future / in other roles and set to "critical => true" without adding a line in Hiera above.

Is this good enough to resolve the ticket with the original "not send sms pages for labtest* things to non-cloud folks" request?

jcrespo closed this task as Resolved.Nov 30 2017, 9:29 AM
jcrespo reassigned this task from chasemp to Dzahn.
Dzahn moved this task from Externally blocked to Done on the monitoring board.May 14 2018, 2:56 PM