Production Excellence #30: March 2021
Monthly update on our strive for operational excellence.

How’d we do in our strive for operational excellence last month? Read on to find out!


2 documented incidents. That's average for this time of year, when we usually had 1-4 incidents.

Learn about recent incidents at Incident status on Wikitech, or Preventive measures in Phabricator.


In March we made significant progress on the outstanding errors of previous months. Several of the 2020 months are finally starting to empty out. But with over 30 new tasks from March itself remaining, we did not break even, and ended up slightly higher than last month. This could be reversing two positive trends, but I hope not.

Firstly, there was a steep increase in the number of new production errors that were not resolved within the same month. This is counter the positive trend we started in November. The past four months typically saw 10-20 errors outlive their month of discovery, and this past month saw 34 of its 48 new errors remain unresolved.

Secondly, we saw the overall number of unresolved errors increase again. This January began a downward trend for the first time in thirteen months, which continued nicely through February. But, this past month we broke even and even pushed upward by one task. I hope this is just a breather and we can continue our way downward.

Unresolved error reports, stacked by month.

Total open production error tasks, by month.

Month-over-month plots based on spreadsheet data.

Outstanding errors

Take a look at the workboard and look for tasks that could use your help:

View Workboard

Summary over recent months, per spreadsheet:

Jul 2019 (0 of 18 left)✅ Last two tasks resolved!-2
Aug 2019 (1 of 14 left)⚠️ Unchanged (over one year old).
Oct 2019 (3 of 12 left)⬇️ One task resolved.-1
Nov 2019 (0 of 5 left)✅ Last task resolved!-1
Dec 2019 (0 of 9 left)✅ Last task resolved!-1
Jan 2020 (2 of 7 left)⬇️ One task resolved.-1
Feb 2020 (0 of 7 left)✅ Last task resolved!-1
Mar 2020 (2 of 2 left)⚠️ Unchanged (over one year old).
Apr 2020 (5 of 14 left)⬇️ Four tasks resolved.-4
May 2020 (5 of 14 left)⬇️ One task resolved.-1
Jun 2020 (6 of 14 left)⬇️ One task resolved.-1
Jul 2020 (5 of 24 issues)⬇️ Four tasks resolved.-4
Aug 2020 (15 of 53 issues)⬇️ Five tasks resolved.-5
Sep 2020 (7 of 33 issues)⬇️ One task resolved.-1
Oct 2020 (22 of 69 issues)⬇️ Four tasks resolved.-4
Nov 2020 (9 of 38 issues)⬇️ Two tasks resolved.-2
Dec 2020 (11 of 33 issues)⬇️ One task resolved.-1
Jan 2021 (4 of 50 issues)⬇️ One task resolved.-1
Feb 2021 (9 of 20 issues)⬇️ Two tasks resolved.-2
Mar 2021 (34 of 48 issues)34 new tasks survived and remain unresolved.+48; -14
138issues open, as of Excellence #29 (6 Mar 2021).
-33issues closed, of the previous 138 open issues.
+34new issues that survived March 2021.
139issues open, as of today (2 Apr 2021).


Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof


Incident status, Wikitech.
Wikimedia incident stats by Krinkle, CodePen.
Production Excellence: Month-over-month spreadsheet and plot.
Report charts for Wikimedia-production-error project, Phabricator.

Written by Krinkle on Sat, Apr 3, 12:20 AM.
Principal Engineer (Performance)
"Love" token, awarded by thcipriani.

Event Timeline