Page MenuHomePhabricator

Incident response tools operational readiness review
Open, MediumPublic

Description

I propose we implement an operational readiness review for the tools we don't use daily but expect to be available during an incident.

These should ultimately be automated and generate alerts but can initially be a manual (monthly?) check.

Event Timeline

LSobanski triaged this task as Medium priority.
LSobanski added a project: SRE.
akosiaris subscribed.

Removing Sustainability (Incident Followup) as it isn't clear by mention of task, incident doc, status doc, task or something similar how it is related to an incident. I am leaving SRE-OnFire as very clearly it can result in better SRE practices.