Page MenuHomePhabricator

nova compute hosts disk space alert does not page
Closed, ResolvedPublic

Description

I am untangling some config for T171494 and in cleaning up references I notice we set:

hieradata/eqiad/openstack/nova/compute.yaml

openstack::nova::compute::base::monitoring::host::nrpe_check_disk_critical: true

only I don't think that's how this works since that option doesn't exist technically. We can make the implicit argument possibly by just doing:

base::monitoring::host::nrpe_check_disk_critical: true

Confirmed by backtracking through Puppet and checking einsteinium.

See option we are attempting to set in:

modules/base/manifests/monitoring/host.pp

Gets passed down to

modules/nrpe/manifests/monitor_service.pp

That basically sets the alert groups to include 'sms'

Checking Einsteinium to verify endstate for labvirt1001 (as an example):

# --PUPPET_NAME-- labvirt1001 disk_space
	active_checks_enabled          1
	check_command                  nrpe_check!check_disk_space!10
	check_freshness                0
	check_interval                 1
	check_period                   24x7
	contact_groups                 admins
	host_name                      labvirt1001
	is_volatile                    0
	max_check_attempts             3
	notification_interval          0
	notification_options           c,r,f
	notification_period            24x7
	passive_checks_enabled         1
	retry_interval                 1
	service_description            Disk space
	servicegroups                  labvirt_eqiad

}

Currently refactoring so I'm going to set this aside to fix after merging 376026 as I have too many balls in the air.

Event Timeline

Change 384589 had a related patch set uploaded (by Rush; owner: cpettet):
[operations/puppet@production] openstack: enable paging on full disk for main labvirt role

https://gerrit.wikimedia.org/r/384589

Change 384589 merged by Rush:
[operations/puppet@production] openstack: enable paging on full disk for main labvirt role

https://gerrit.wikimedia.org/r/384589

Resources modified

Class[Profile::Base]
Parameters differences:
--- Class[Profile::Base].orig
+++ Class[Profile::Base]

@@
-    check_disk_critical => False
+    check_disk_critical => True
Monitoring::Service[disk_space]
Parameters differences:
--- Monitoring::Service[disk_space].orig
+++ Monitoring::Service[disk_space]

@@
-    critical => False
+    critical => True
Class[Base::Monitoring::Host]
Parameters differences:
--- Class[Base::Monitoring::Host].orig
+++ Class[Base::Monitoring::Host]

@@
-    nrpe_check_disk_critical => False
+    nrpe_check_disk_critical => True
Nrpe::Monitor_service[disk_space]
Parameters differences:
--- Nrpe::Monitor_service[disk_space].orig
+++ Nrpe::Monitor_service[disk_space]

@@
-    critical => False
+    critical => True

https://gerrit.wikimedia.org/r/#/c/384589/