Page MenuHomePhabricator

Paging setup for WMCS
Closed, ResolvedPublic


WMCS has had scads of tickets and discussions around paging over the past couple years looking to get us to a place where we have

  • pages that go to our clinic-duty tech
  • an escalation that reaches the whole WMCS team if that tech isn't able to respond
  • A possible escalation for certain machines (dumps and wiki replicas in particular) to include other SREs/DBAs with direct interest in those.

With the victorops migration, perhaps this is now possible. We would like to be able to use it to try that.

Event Timeline

Bstorm triaged this task as Medium priority.Apr 20 2020, 4:37 PM
Bstorm created this task.

I've invited WMCS folks to VO now, you should all have invites in your inbox! Please see for additional instructions/setup. Note that you are in WMCS team instead, and there will be additional configuration (e.g. escalation, rotation, and the icinga contact to add), I'm not sure what your preferences are there but it should be fairly straightforward. Let us know!

Thanks @fgiunchedi!

I started playing with the setup on the VO side and made some tweaks to the process that the core SRE folks are using. This is just me playing around, and not canonical yet, so NOBODY PANIC! :)

I made 2 rotations: "work hours" and "awake hours". Within each I added a "bd808" shift and set days of week + times of day partial hours that I would normally be willing and able to handle some notification. Then I tweaked the "WMCS default" escalation policy to do:

  • Immediately: notify on-duty users in the "work hours" rotation
  • Unacked after 30 minutes: notify on-duty users in the "awake hours" rotation

This is a really, really crude escalation policy. It is also at least a straw dog to start thinking about how to make better. The first obvious idea of adding to this would be a "clinic duty" rotation where we actually put in our weekly rotation and then sticking that in the escalation policy before poking folks in the "work hours" rotation. I'm sure the team will come up with some other ideas as well.

Update: I chatted with @aborrero today and created a routing key 'wmcs' linked to the default wmcs escalation, icinga emails can be then sent already to the VO address (in private repo)

Change 597047 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] nagios: add victorops-wmcs contact to the wmcs team

Change 597047 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] nagios: add victorops-wmcs contact to the wmcs team

I think we've mostly settled into a pattern with this that is ok for now.