Page MenuHomePhabricator

automation: issue reminders for about-to-expire downtimes
Open, MediumPublic

Description

We get a lot of alert noise (sometimes even pages) from manual downtimes that have expired.

We should have some automation that notices a downtime that's about to expire, and lets both the creator of the downtime and #wikimedia-operations know about it.

Picking how far in advance to remind is difficult; to be effective it probably needs to take into account the working hours of the creator / of the team, and possibly also the original duration of the downtime -- some downtimes last an hour; others last days (or even weeks?)

It would also be really nice if a list of alerts that are affected, and their current status -- ideally reminders would only be sent when there are criticals that are about to be unmasked.

Event Timeline

CDanis created this task.Aug 16 2019, 9:12 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 16 2019, 9:12 PM
CDanis triaged this task as Medium priority.Aug 16 2019, 9:12 PM
fgiunchedi moved this task from Inbox to Backlog on the observability board.Mon, Jul 20, 1:29 PM