Production Excellence #13: July 2019
Monthly update on our strive for operational excellence.

How’re we doing on that strive for operational excellence? Read this first anniversary edition to find out!

📊 Month in numbers
  • 5 documented incidents. [1]
  • 53 new Wikimedia-prod-error reports. [2]
  • 44 closed Wikimedia-prod-error reports. [3]
  • 218 currently open Wikimedia-prod-error reports in total. [4]

The number of recorded incidents over the past month, at five, is equal to the median number of incidents per month (2016-2019). – Explore this data.

To read more about these incidents, their investigations, and pending actionables; check Incident documentation § 2019.

📖 One year of Excellent adventures!

Exactly one year ago this periodical started to provide regular insights on production stability. The idea was to shorten the feedback cycle between deployment of code that leads to fatal errors and the discovery of those errors. This allows more people to find reports earlier, which (hopefully) prevents them from sneaking into a growing pile of “normal” errors.

576 reports were created between 15 July 2018 and 31 July 2019 (tagged Wikimedia-prod-error).
425 reports got closed over that same time period.

Read the first issue in story format, or the initial e-mail.

📉 Outstanding reports

Take a look at the workboard and look for tasks that might need your help. The workboard lists error reports, grouped by the month in which they were first observed.


Or help someone who already started with their patch:
Open prod-error tasks with a Patch-For-Review

Breakdown of recent months (past two weeks not included):

  • November: 1 report left (unchanged). ⚠️
  • December: 3 reports left (unchanged). ⚠️
  • January: 1 report left (unchanged). ⚠️
  • February: 2 reports left (unchanged). ⚠️
  • March: 4 reports left (unchanged). ⚠️
  • April: 10 of 14 reports left (unchanged). ⚠️
  • May: 2 reports got fixed! (4 of 10 reports left). ❇️
  • June: 2 reports got fixed! (9 of 11 reports left). ❇️
  • July: 18 new reports from last month remain unsolved.

🎉 Thanks!

Thank you to @aaron, @Anomie, @ArielGlenn, @Catrope, @cscott, @Daimona, @dbarratt, @dcausse, @EBernhardson, @Jdforrester-WMF, @jeena, @MarcoAurelio, @SBisson, @Tchanders, @Tgr, @tstarling, @Urbanecm; and everyone else who helped by finding, investigating, or resolving error reports in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

Quote: 🎙 “Unlike money, hope is for all: for the rich as well as for the poor.”


[1] Incidents. – wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=Incident…

[2] Tasks created. – phabricator.wikimedia.org/maniphest/query…

[3] Tasks closed. – phabricator.wikimedia.org/maniphest/query…

[4] Open tasks. – phabricator.wikimedia.org/maniphest/query…

Written by Krinkle on Aug 30 2019, 8:08 PM.
Principal Engineer (Wikimedia Performance)

Event Timeline