Page MenuHomePhabricator

List of alerts we currently receive and deciding on the alerts we want to implement
Closed, ResolvedPublic3 Estimated Story Points

Description

User story

AS a WME engineer and as a PM I want to know what current alerts we are receiving in PD to make informed decisions on what alerts we want to implement. This work is the first step towards monitoring alerts, and later on, associating them with a service and assigning an escalation policy.

To do

  • 1. Go to PagerDuty and understand the configuration, and the alert system we have in place at the moment
  • 2. Make a list of what current alerts we are receiving in PD
  • QA

Update

Current list of alarms are docurneted in the wiki of the Incident management project.
https://gitlab.enterprise.wikimedia.com/wikimedia-enterprise/incident-management/-/wikis/Alarms

Details

Other Assignee
REsquito-WMF

Event Timeline

JArguello-WMF set the point value for this task to 8.
JArguello-WMF updated the task description. (Show Details)
JArguello-WMF removed the point value for this task.

We need a place to start documenting the alerts we currently have.
Options available
a) Gitlab wiki pages
b) Wikitech
c) Mediawiki
d) other

Decision makers: @E.Enabulele @REsquito-WMF @LDlulisa-WMF @ROdonnell-WMF @HShaikh @prabhat
Consulted: @Protsack.stephan @Alex.lep.sp

JArguello-WMF set the point value for this task to 3.Aug 9 2023, 1:13 PM

QA was performed by reviewing IaC. Minor comments were addressed. The documentation is correct.