Page MenuHomePhabricator

Evaluate, suggest and choose an alert escalation solution
Open, Needs TriagePublic

Description

As part of Q2 OKRs, Observability will be implementing phase 3 of the alerting infrastructure roadmap, namely alert escalation.

This task is to track the following:

  • Document the requirements/criteria for choosing an alert escalation solution
  • Develop an escalation tree proposal, including paging service, owners, schedules and escalation layers/timeouts
  • Develop a recommendation for notification/escalation solution, based on requirements

Details

Related Gerrit Patches:
operations/puppet : productionnagios: add PD to sms contact group
operations/puppet : productionnagios: add VO and OG to sms contact group

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 21 2019, 3:02 PM
fgiunchedi moved this task from Backlog to Doing on the User-fgiunchedi board.Nov 13 2019, 11:22 AM
fgiunchedi moved this task from Inbox to In progress on the observability board.Nov 25 2019, 1:49 PM
fgiunchedi updated the task description. (Show Details)Dec 16 2019, 3:08 PM

Change 563483 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] nagios: add VO and OG to sms contact group

https://gerrit.wikimedia.org/r/563483

Change 563483 merged by Cwhite:
[operations/puppet@production] nagios: add VO and OG to sms contact group

https://gerrit.wikimedia.org/r/563483

Change 563963 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] nagios: add PD to sms contact group

https://gerrit.wikimedia.org/r/563963

Change 563963 merged by Filippo Giunchedi:
[operations/puppet@production] nagios: add PD to sms contact group

https://gerrit.wikimedia.org/r/563963