Page MenuHomePhabricator

Get smalyshev permissions to icinga enough to control monitoring for wdqs_eqiad group
Closed, ResolvedPublic


In order to be able to control notification settings on wdqs_eqiad hosts (wdqs1001 and wdqs1002) - e.g. disable notifications when one of the hosts is taken down for maintenance - need access to monitoring settings on icinga.

Event Timeline

Smalyshev raised the priority of this task from to Needs Triage.
Smalyshev updated the task description. (Show Details)
Smalyshev added a subscriber: Smalyshev.

@Smalyshev As a first step, can we confirm that a basic login on works for you? Read-only access should already work and it should be your LDAP/Wikitech/Labs user as long as you are in the WMF group. Does that work? If yes, is the username exactly as here on phabricator? Permissions to send commands would be a separate thing and not handled via LDAP but require a puppet change.

@Dzahn, yes, I can log in to icinga and see stuff, but not control notifications. The username is "smalyshev".

@Smalyshev Ok, great. So the next step to be able to run commands (schedule downtime, disable notifications, ACKnowledge issues, etc) and also to get notifications (email, paging) is that in the Icinga context you have to be a contact (user).

We keep these in a private repo because they contain phone numbers. In your cause i have just added email and skipped the phone part for now. That can be changed later if desired.

Because it's just email i left the notification period at 24x7, but we can also use custom timezones here.

Even without any notification options we would need the "contact" to exist to give it permissions, so i added this.

define contact{
        contact_name                    smalyshev
        alias                           Stas Malychev
        host_notification_period        24x7
        service_notification_period     24x7
        host_notification_options       d,r,f
        service_notification_options    c,r,f
        host_notification_commands      host-notify-by-email
        service_notification_commands   notify-by-email

the options mean that you get notified if hosts are: d (down), r (recover) or are f (flapping) and if services are c (critical) or r (recover) or are f (flapping).

Now this icinga contact can be used in puppet classes that apply monitoring for wdqs servers so that it becomes attached to the right hosts and services. Hopefully that already solves it because being a contact in Icinga gives you the permissions for these services. (as opposed to global permissions for all services and hosts that would be specified in cgi.cfg)

@Dzahn see T105229#1600235 for being able to send the commands as a contact :)

Dzahn triaged this task as Medium priority.Sep 3 2015, 2:01 AM

Change 237499 had a related patch set uploaded (by Dzahn):
icinga: add new contact group wdqs-admins

Change 237499 merged by Dzahn:
icinga: add new contact group wdqs-admins

Change 237504 had a related patch set uploaded (by Dzahn):
icinga: set admin groups for common/wdqs in hiera

Change 237504 merged by Dzahn:
icinga: set contact groups for common/wdqs in hiera

Change 237508 had a related patch set uploaded (by Dzahn):
wdqs: set icinga contact groups, add wdqs-admins

Change 237508 merged by Dzahn:
wdqs: set icinga contact groups, add wdqs-admins

This needs it appears. Johnlewis said @RobH was going to review.

I merged . There is no difference on neon, not in a bad way but also not in a good way.

@JohnLewis see comment on gerrit. the override did not work before?!

We believe this is blocked by Ops, who are currently attending their offsite. This task isn't urgent and can wait until the offsite has concluded.

We believe this is blocked by Ops, who are currently attending their offsite. This task isn't urgent and can wait until the offsite has concluded.

Offsite is next week. My understanding is this is blocked on figuring out why my patch didn't change things. Will look later with ops help.

Offsite is next week. My understanding is this is blocked on figuring out why my patch didn't change things. Will look later with ops help.

Ah. Excellent! Thank you.

Change 244722 had a related patch set uploaded (by Dzahn):
icinga: ensure hiera lookups for all contact_group defs

Change 244722 merged by Dzahn:
icinga: ensure hiera lookups for all contact_group defs

@Dzahn any progress on this?

@Smalyshev yes, now there is. thanks to John Lewis the override for contact groups via hieradata, that was broken, works now. That gets us an important step closer. Now we can set the right contacts and those should determine the permissions to send commands.

Change 244813 had a related patch set uploaded (by Dzahn):
fix hiera key for wdqs (contactgroups)

Change 244813 merged by Dzahn:
fix hiera key for wdqs (contactgroups)

now in icinga config we can see how our new contact group has been added to services on wdqs hosts.

1root@neon:/etc/icinga# grep -B3 -A1 wdqs-admins /etc/icinga/puppet_services.cfg | grep -v freshness | grep -v check_period
3 check_command nrpe_check!check_check_dhclient!10
4 contact_groups admins,wdqs-admins
5 host_name wdqs1001
7 check_command nrpe_check!check_check_eth!10
8 contact_groups admins,wdqs-admins
9 host_name wdqs1001
11 check_command nrpe_check!check_check_salt_minion!10
12 contact_groups admins,wdqs-admins
13 host_name wdqs1001
15 check_command nrpe_check!check_disk_space!10
16 contact_groups admins,wdqs-admins
17 host_name wdqs1001
19 check_command nrpe_check!check_dpkg!10
20 contact_groups admins,wdqs-admins
21 host_name wdqs1001
23 check_command check_ntp_time!0.5!1
24 contact_groups admins,wdqs-admins
25 host_name wdqs1001
27 check_command nrpe_check!check_puppet_checkpuppetrun!10
28 contact_groups admins,wdqs-admins
29 host_name wdqs1001
31 check_command nrpe_check!check_raid!10
32 contact_groups admins,wdqs-admins
33 host_name wdqs1001
35 check_command check_ssh
36 contact_groups admins,wdqs-admins
37 host_name wdqs1001
39 check_command nrpe_check!check_WDQS_Blazegraph_process!10
40 contact_groups admins,wdqs-admins
41 host_name wdqs1001
43 check_command check_http!!/!Welcome
44 contact_groups admins,wdqs-admins
45 host_name wdqs1001
47 check_command check_http!!/bigdata/namespace/wdq/sparql?query=prefix%20schema:%20%3C*%20WHERE%20%7B%3C!"xsd:dateTime"
48 contact_groups admins,wdqs-admins
49 host_name wdqs1001
51 check_command nrpe_check!check_WDQS_Internal_HTTP_endpoint!10
52 contact_groups admins,wdqs-admins
53 host_name wdqs1001
55 check_command nrpe_check!check_WDQS_Local_Blazegraph_endpoint!10
56 contact_groups admins,wdqs-admins
57 host_name wdqs1001
59 check_command nrpe_check!check_WDQS_Updater_process!10
60 contact_groups admins,wdqs-admins
61 host_name wdqs1001
63 check_command nrpe_check!check_check_dhclient!10
64 contact_groups admins,wdqs-admins
65 host_name wdqs1002
67 check_command nrpe_check!check_check_eth!10
68 contact_groups admins,wdqs-admins
69 host_name wdqs1002
71 check_command nrpe_check!check_check_salt_minion!10
72 contact_groups admins,wdqs-admins
73 host_name wdqs1002
75 check_command nrpe_check!check_disk_space!10
76 contact_groups admins,wdqs-admins
77 host_name wdqs1002
79 check_command nrpe_check!check_dpkg!10
80 contact_groups admins,wdqs-admins
81 host_name wdqs1002
83 check_command check_ntp_time!0.5!1
84 contact_groups admins,wdqs-admins
85 host_name wdqs1002
87 check_command nrpe_check!check_puppet_checkpuppetrun!10
88 contact_groups admins,wdqs-admins
89 host_name wdqs1002
91 check_command nrpe_check!check_raid!10
92 contact_groups admins,wdqs-admins
93 host_name wdqs1002
95 check_command check_ssh
96 contact_groups admins,wdqs-admins
97 host_name wdqs1002
99 check_command nrpe_check!check_WDQS_Blazegraph_process!10
100 contact_groups admins,wdqs-admins
101 host_name wdqs1002
103 check_command check_http!!/!Welcome
104 contact_groups admins,wdqs-admins
105 host_name wdqs1002
107 check_command check_http!!/bigdata/namespace/wdq/sparql?query=prefix%20schema:%20%3C*%20WHERE%20%7B%3C!"xsd:dateTime"
108 contact_groups admins,wdqs-admins
109 host_name wdqs1002
111 check_command nrpe_check!check_WDQS_Internal_HTTP_endpoint!10
112 contact_groups admins,wdqs-admins
113 host_name wdqs1002
115 check_command nrpe_check!check_WDQS_Local_Blazegraph_endpoint!10
116 contact_groups admins,wdqs-admins
117 host_name wdqs1002
119 check_command nrpe_check!check_WDQS_Updater_process!10
120 contact_groups admins,wdqs-admins
121 host_name wdqs1002

And finally I added can_submit_commands 1 to the contact of smalyshev in the private repo.

He could confirm he can send commands now for wdqs-services but not for other services. Just like we wanted.

And set in a role in hiera. :)

Checked and now I can control notifications for wdqs and also am getting alerts by email. Thanks!

Dzahn removed a project: Patch-For-Review.

added "w" to service_notification_options and "u" to host_notification_options in the contact definition, so that there is also mail for warnings (and host unreachable) (per talk on IRC)