Page MenuHomePhabricator

Incorrect icinga settings for mobrovac
Closed, ResolvedPublic0 Estimated Story Points

Description

I have noticed that my Icinga settings seem to have been (unexpectedly) changed. Namely, I don't seem to be getting email alerts about RESTBase (at least) any more, which used to be the case. Furthermore, I used to be able to downtime/ack checks in the icinga UI, but I don't have the rights to do so any more either.

Event Timeline

Volans triaged this task as Medium priority.

When was the last time this worked for you? modules/icinga/files/cgi.cfg has your shell user name, but it would need to be set to your cn:, i.e. "Mobrovac".

@mobrovac

  • Regarding the notification AFAICT the RESTBase alerts notify the team-services group (services@). I don't see that alias defined in our exim configuration, so probably is managed by OIT. Could you check with them if that is still valid and includes the right people?
  • Regarding the UI actions the user mobrovac is properly authorized in Icinga configuration, could you check that you're logged in with the correct case of the username?

When was the last time this worked for you? modules/icinga/files/cgi.cfg has your shell user name, but it would need to be set to your cn:, i.e. "Mobrovac".

Aaah! Could be that that's the problem. Looking at my email, last time I got a notification about the services having problems was on May 18, but that was only for LVS. Node-related notifications seem to have ceased prior to that (last one was May 15, but should have gotten emails on May 21, when we had a brief RB outage).

As for downtime/ack permissions, it's been a while that I needed to do that, so I honestly don't know when was the last time it worked. It definitely was before I needed to use my capitalised username to enter Icinga.

@mobrovac

  • Regarding the notification AFAICT the RESTBase alerts notify the team-services group (services@). I don't see that alias defined in our exim configuration, so probably is managed by OIT. Could you check with them if that is still valid and includes the right people?

That still doesn't make sense. On that alias I do receive emails, so the alias part is ok. What is weird is that not all email notifs that used to come come any more. E.g. I get notifications about Cassandra being down on a node, but not about RESTBase. In the same vein, it seems I'm not receiving emails for SCB services either...

  • Regarding the UI actions the user mobrovac is properly authorized in Icinga configuration, could you check that you're logged in with the correct case of the username?

Yes, I'm logged in as Mobrovac (mobrovac is invalid now, so there's only one way for me to log in).

That still doesn't make sense. On that alias I do receive emails, so the alias part is ok. What is weird is that not all email notifs that used to come come any more. E.g. I get notifications about Cassandra being down on a node, but not about RESTBase. In the same vein, it seems I'm not receiving emails for SCB services either...

Ok, I'll check more in detail those alerts and get back to you.

Yes, I'm logged in as Mobrovac (mobrovac is invalid now, so there's only one way for me to log in).

Ack, I'll send a patch to fix this.

Change 512680 had a related patch set uploaded (by Volans; owner: Volans):
[operations/puppet@production] icinga: fix Mobrovac case for authorization

https://gerrit.wikimedia.org/r/512680

Change 512680 merged by Volans:
[operations/puppet@production] icinga: fix Mobrovac case for authorization

https://gerrit.wikimedia.org/r/512680

@mobrovac can you retry actions on the Icinga UI?

@mobrovac can you retry actions on the Icinga UI?

Icinga UI acking now works, thank you @Volans !

So after a bit of debugging with @mobrovac it seems that the alarm that is not notifying the team-services contact is the restbase endpoints health one that seems to be generated by the service::node Puppet define.
That define has support for adding custom contact groups, and I don't see any defined for the team-services in hieradata/.
According to @mobrovac this was working before so I'm wondering if anything changed recently elsewhere given that this part of the code hasn't AFAICT.

Change 512742 had a related patch set uploaded (by Volans; owner: Volans):
[operations/puppet@production] restbase: add team-services to Icinga notifications

https://gerrit.wikimedia.org/r/512742

Change 512742 merged by Volans:
[operations/puppet@production] restbase: add team-services to all Icinga alerts

https://gerrit.wikimedia.org/r/512742

Volans removed a project: Patch-For-Review.

And with the above patch merged it should all be resolved. Reopen if needed.