In order to be able to control notification settings on wdqs_eqiad hosts (wdqs1001 and wdqs1002) - e.g. disable notifications when one of the hosts is taken down for maintenance - need access to monitoring settings on icinga.
Description
Details
Related Objects
- Mentioned In
- T125975: Add contact for Addshore in icinga
T105229: give John Lewis permissions to send commands in icinga for fermium/mailman
rOPUP55ed1d08499a: wdqs: set icinga contact groups, add wdqs-admins
rOPUPc2559546cdaf: icinga: set contact groups for common/wdqs in hiera
rOPUP160f61eb7663: icinga: add new contact group wdqs-admins - Mentioned Here
- P2181 icinga checks with contact group wdqs-admins
T105229: give John Lewis permissions to send commands in icinga for fermium/mailman
Event Timeline
@Smalyshev As a first step, can we confirm that a basic login on icinga.wikimedia.org works for you? Read-only access should already work and it should be your LDAP/Wikitech/Labs user as long as you are in the WMF group. Does that work? If yes, is the username exactly as here on phabricator? Permissions to send commands would be a separate thing and not handled via LDAP but require a puppet change.
@Dzahn, yes, I can log in to icinga and see stuff, but not control notifications. The username is "smalyshev".
@Smalyshev Ok, great. So the next step to be able to run commands (schedule downtime, disable notifications, ACKnowledge issues, etc) and also to get notifications (email, paging) is that in the Icinga context you have to be a contact (user).
We keep these in a private repo because they contain phone numbers. In your cause i have just added email and skipped the phone part for now. That can be changed later if desired.
Because it's just email i left the notification period at 24x7, but we can also use custom timezones here.
Even without any notification options we would need the "contact" to exist to give it permissions, so i added this.
define contact{ contact_name smalyshev alias Stas Malychev host_notification_period 24x7 service_notification_period 24x7 host_notification_options d,r,f service_notification_options c,r,f email smalyshev@wikimedia.org address1 smalyshev@wikimedia.org host_notification_commands host-notify-by-email service_notification_commands notify-by-email }
the options mean that you get notified if hosts are: d (down), r (recover) or are f (flapping) and if services are c (critical) or r (recover) or are f (flapping).
Now this icinga contact can be used in puppet classes that apply monitoring for wdqs servers so that it becomes attached to the right hosts and services. Hopefully that already solves it because being a contact in Icinga gives you the permissions for these services. (as opposed to global permissions for all services and hosts that would be specified in cgi.cfg)
Change 237499 had a related patch set uploaded (by Dzahn):
icinga: add new contact group wdqs-admins
Change 237504 had a related patch set uploaded (by Dzahn):
icinga: set admin groups for common/wdqs in hiera
Change 237508 had a related patch set uploaded (by Dzahn):
wdqs: set icinga contact groups, add wdqs-admins
This needs https://gerrit.wikimedia.org/r/#/c/235065/ it appears. Johnlewis said @RobH was going to review.
I merged https://gerrit.wikimedia.org/r/#/c/235065/ . There is no difference on neon, not in a bad way but also not in a good way.
@JohnLewis see comment on gerrit. the override did not work before?!
We believe this is blocked by Ops, who are currently attending their offsite. This task isn't urgent and can wait until the offsite has concluded.
Offsite is next week. My understanding is this is blocked on figuring out why my patch didn't change things. Will look later with ops help.
Change 244722 had a related patch set uploaded (by Dzahn):
icinga: ensure hiera lookups for all contact_group defs
Change 244722 merged by Dzahn:
icinga: ensure hiera lookups for all contact_group defs
@Smalyshev yes, now there is. thanks to John Lewis the override for contact groups via hieradata, that was broken, works now. That gets us an important step closer. Now we can set the right contacts and those should determine the permissions to send commands.
Change 244813 had a related patch set uploaded (by Dzahn):
fix hiera key for wdqs (contactgroups)
now in icinga config we can see how our new contact group has been added to services on wdqs hosts.
And finally I added can_submit_commands 1 to the contact of smalyshev in the private repo.
He could confirm he can send commands now for wdqs-services but not for other services. Just like we wanted.
And set in a role in hiera. :)
Checked and now I can control notifications for wdqs and also am getting alerts by email. Thanks!
added "w" to service_notification_options and "u" to host_notification_options in the contact definition, so that there is also mail for warnings (and host unreachable) (per talk on IRC)