Page MenuHomePhabricator

Fundraising-Tech engineers unable to ACK icinga alerts on fr-tech host groups
Closed, ResolvedPublic

Event Timeline

@jgleeson I see in the screenshot you are logged in as "Jgleeson". Try (in a new browser session since there is no logout button) to login instead as "jgleeson" without the capitalization and try again to send an ACK or any other command on those hosts. Let me know if that changes anything please. If not we'll take a deeper look what might be missing but the capitalization issue is unfortunately a common one. It's because the auth we put in front of it accepts both versions but for Icinga internally it's 2 different users.

Thanks @Dzahn

I've just tried ACKing another alert after logging in with all lower case chars but the outcome is still the same sadly.

Screenshot from 2022-01-06 12-55-04.png (900×1 px, 154 KB)

Dzahn changed the task status from Open to In Progress.Jan 6 2022, 6:55 PM
Dzahn claimed this task.

Mentioned in SAL (#wikimedia-operations) [2022-01-06T18:59:32Z] <mutante> puppetmaster1001 - creating missing Icinga contact for jgleeson in private puppet repo T298649

Change 751980 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] icinga: let Jack Gleeson run commands for any host or service

https://gerrit.wikimedia.org/r/751980

Hey @jgleeson soo.. you did not have an Icinga contact and that had to be created in the private puppet repository. I just did that.

Then I assumed I'd just have to add you to the existing 'fr-tech-ops" Icinga contact group and that's it. But it turned out for some reason those groups exist but don't appear to be used with any of the Icinga hosts and services for fundraising.

When I did a " grep contact_groups puppet_hosts.cfg | sort | uniq | grep fr" and the same for puppet_services.cfg those contact groups don't appear to be associated with any checks, even though the groups themselves exist and have a setup that looks like some team mailing list is supposed to get notifications about them.

To fix this for you without having to fix ALl the fundraising related checks by adding the contact groups to them.. I uploaded an alternative change that adds you in another place where we can globally override these things and just give you privileges on ANY host or service in Icinga, not limited to fundraising.

Whils not the proper way I believe this is what has been done for others in the past as well. Added some reviewers to that in Gerrit now.

Cheers

and cc: @Jgreen

When I did a " grep contact_groups puppet_hosts.cfg | sort | uniq | grep fr" and the same for puppet_services.cfg those contact groups don't appear to be associated with any checks, even though the groups themselves exist and have a setup that looks like some team mailing list is supposed to get notifications about them.

Fundraising-related services are done with passive checks, so the relevant configuration is in puppet/icinga/templates/nsca_frack.cfg.erb.

Change 751980 merged by Dzahn:

[operations/puppet@production] icinga: let Jack Gleeson run commands for any host or service

https://gerrit.wikimedia.org/r/751980

Fundraising-related services are done with passive checks, so the relevant configuration is in puppet/icinga/templates/nsca_frack.cfg.erb.

Arr, thank you for the reminder Jeff. This explains of course also that I could not see them in the generated files on the icinga server because those end up in objects/nsca_frack.cfg and all active checks are in puppet_services.cfg.

I'll add Jack to fr-tech-ops. He probably has more than needed now.

Thanks for all the digging on this @Dzahn, hugely appreciated!

Sorry to be a pain but a few others on fr-tech are also gonna need this privilege in the future. Should I create a new ticket with their wiki tech usernames when I know the full list, likely on Monday? I can link back to this task for context.

Change 752002 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] nagios_common: add jgleeson to fr-tech-ops Icinga contact group

https://gerrit.wikimedia.org/r/752002

Change 752002 merged by Dzahn:

[operations/puppet@production] nagios_common: add jgleeson to fr-tech-ops Icinga contact group

https://gerrit.wikimedia.org/r/752002

@jgleeson Let's see if it works now, as "jgleeson". it should work both via global rights AND you are also in the fr-tech-ops group now.

regarding the other users, no problem to add them. yes, please make another ticket with their wikitech usernames and link to this, that would be perfect. thanks

Would still have to confirm whether it's actually enough to use only the fr-tech-ops group. We can maybe check that when adding the next user.

@jgleeson We can either resolve this if it works for you or keep using it for the other people that need to be added. up to you.

Sorry Dan I forgot to check in on this today and have finished work as I'm
working from the UK.

I'll test out the permissions over the weekend and let you know?

Thanks!

Yes yes, there was no expectation that this happens right now or you work on the weekend. This was for Monday or whenever. Thanks

Dzahn triaged this task as Medium priority.Jan 7 2022, 8:46 PM

Thanks @Dzahn. Let's stick with this ticket if it's easiest? I'll get the wikitech names of the other fr-tech folks today and post them here.

I'm also watching out for a service alert to test out the new powers! hopefully, something will break (wince)...today so I can confirm all is good :)

sure, sounds all good to me :)

Great news @Dzahn. I just got around to testing out the new permissions and it worked! I was able to ACK the following alert https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=payments1008&service=check_mysql :) thanks man

Also here is the list of wikitech usernames for the other folks on fr-tech who need the same privileges:

  • Cstone
  • ejegg
  • andyrussg
  • "Damilare Adedoyin"
  • XenoRyet
  • Wfan
Dzahn added a project: SRE-Access-Requests.

@jgleeson Thanks for confirming, great! We have a rotating clinic duty each week handling access requests, so someone else might continue on this shortly.

Change 753065 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/puppet@production] Add Elliot Eggleston (ejegg) to fr-tech-ops Icinga contact group.

https://gerrit.wikimedia.org/r/753065

cmooney added subscribers: Ejegg, cmooney.

@jglesson hey just following up on this as I am on Clinic Duty this week.

I think as per Daniel's earlier comment it might be best not to immediately replicate what he did for your user for everyone. As per his comment:

Would still have to confirm whether it's actually enough to use only the fr-tech-ops group. We can maybe check that when adding the next user.

So what I've done is added Elliot's user to the fr-tech-ops contact group only (and not to the other list that provides global permission). Could you ask @Ejegg to try to ACK the next one of these that come in and see if that works? If it does I will add the remaining users in there only, otherwise we'll need to give everyone global permissions.

Thanks.

Change 753065 merged by Cathal Mooney:

[operations/puppet@production] Add Elliot Eggleston (ejegg) to fr-tech-ops Icinga contact group.

https://gerrit.wikimedia.org/r/753065

Thanks @cmooney and yeah it makes sense not to give us permissions we don't need. I'll mention it to @Ejegg when he comes online today and we'll let you know!

Also if just being in the fr-tech-ops group works, feel free to remove me from the global list :)

Thanks again.

@cmooney Perfect, I wanted to add exactly that but you already got it :) thanks

@jgleeson @Ejegg is there anything else to do here or we can consider this done for now?

hey @Volans, thanks for the reminder. Let's hold out until @Ejegg has had a chance to confirm the permissions have taken effect. There are a few other folks on fundraising-tech listed in https://phabricator.wikimedia.org/T298649#7610886 who'll need the same permissions applying once we're sure the updates made by @cmooney work as expected.

Thanks much!

Thanks @Volans! I was out sick last week, but today I was able to ack a test alert using my lowercase ejegg login.

Great. @Volans42 are you able to demote my account to the level of @Ejegg's and also add the other folks referred to the group, please? I think that'll wrap this up. Thanks :)

Volans added a subscriber: jhathaway.

Thanks for checking @Ejegg. @jgleeson sure, we can do that. Assigning to @jhathaway.

Change 757043 had a related patch set uploaded (by JHathaway; author: JHathaway):

[operations/puppet@production] icinga: add additional users to fr-tech-ops

https://gerrit.wikimedia.org/r/757043

Change 757043 merged by JHathaway:

[operations/puppet@production] icinga: add additional users to fr-tech-ops

https://gerrit.wikimedia.org/r/757043

@jgleeson this change has been merged in so all users should be able to ack alerts. I am going to close this issue, but please reopen if there are any issues!

Thanks so much @jhathaway and everyone else who helped along the way! :)