Develop set of metrics to assess incident reports/post mortems
Open, NormalPublic

Description

Task description coming soon.


While working on T199133, a few of ideas on how to improve incident reports came to my mind:

  • Incident reports since 2011 are listed at https://wikitech.wikimedia.org/wiki/Incident_documentation
  • The page has a form that creates an incident report using https://wikitech.wikimedia.org/wiki/Incident_documentation/Report_Template as a template
  • The only required data is service name in the incident report page name - YYYYMMDD-$NameOfService
  • Getting data from incident reports is hard because they are unstructured text.
  • Contacting the person that created (or additionally everybody that edited) the incident report and asking for clarifications. Example: Which Gerrit repositories is this incident report connected to?
  • Adding section(s) to the incident report might be helpful: relevant Phabricator tasks, Gerrit commits, Gerrit repositories (if the outage is connected to a repository, but there are no relevant commits)