Production Excellence #20: April 2020
Monthly update on our strive for operational excellence.

How are we doing on that strive for operational excellence during these unprecedented times?

📊  Numbers for March and April
  • 3 documented incidents. [1]
  • 60 new Wikimedia-prod-error reports. [2]
  • 58 Wikimedia-prod-error reports closed. [3]
  • 178 currently open Wikimedia-prod-error reports in total. [4]

For more about recent incidents and pending actionables see Wikitech and Phabricator.

📉  Outstanding reports

Take a look at the workboard and look for tasks that could use your help.

→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/

Breakdown of recent months:

  • April 2019: Two reports closed, 2 of 14 left.
  • May: (All clear!)
  • June: 4 of 11 left (unchanged). ⚠️
  • July: 8 of 18 left (unchanged).
  • August: 2 of 14 reports left (unchanged).
  • September: 7 of 12 left (unchanged).
  • October: Two reports closed, 4 of 12 left.
  • November: One report closed, 4 of 5 left.
  • December: Two reports closed, 4 of 9 left.
  • January 2020: One report closed, 5 of 7 reports left.
  • February: One report closed, 6 of 7 reports left.
  • March: 2 new reports survived the month of March.
  • April: 13 new reports survived the month of April.

At the end of February the total of open reports over recent months was 58. Of those, 12 got closed, but with 15 new reports from March/April still open, the total is now up at 61 open reports.

The workboard overall (which includes pre-2019 tasks) has 178 tasks open. This is actually down by a bit for the first time since October with December at 196, January at 198, and February at 199, and now April at 178. This was largely due to the Release Engineering and Core Platform teams closing out forgotten reports that have since been resolved or otherwise obsoleted.

💡 Tip: Verifying existing tasks is a good way to (re)familiarise yourself with Kibana. For example: Does the error still occur in the last 30 days? Does it only happen on a certain wiki? What do the URLs or stack traces have in common?

🎉  Thanks!

Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

[1] Incidents. – https://wikitech.wikimedia.org/wiki/Incident_documentation
[2] Tasks created. – https://phabricator.wikimedia.org/maniphest/query/HjopcKClxTfw/#R
[3] Tasks closed. – https://phabricator.wikimedia.org/maniphest/query/ts62HKYPBxod/#R
[4] Open tasks. – https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R

Written by Krinkle on May 14 2020, 4:10 PM.
Principal Engineer (WMF Performance Team)

Event Timeline