Production Excellence #33: June 2021
Monthly update on our strive for operational excellence.

How’d we do in our strive for operational excellence last month? Read on to find out!


3 documented incidents. That's lower than June in the previous five years where the month saw 5-9 incidents. I've added a new panel ⭐️ to the Incident statistics tool. This one plots monthly statistics on top of previous years, to more easily compare them:

proderr-incidents 2021-06.png (381×730 px, 75 KB)

Learn more from the Incident documents on Wikitech, and remember to review and schedule Incident Follow-up in Phabricator, which are preventive measures and other action items filed after an incident.


In June, work on production errors appears to have stagnated a bit. Or more precisely, the work only resulted in relatively few tasks being resolved. 15 of the 26 new tasks are still open as of writing.

Of the tasks from previous months, only 11 were resolved, leaving most columns unchanged. See the table further down for a more detailed breakdown and links to Phabricator queries for the tasks in question.

With the 15 remaining new tasks, and the 11 tasks resolved from our backlog, this raises the chart from 150 to 154 tasks.

Take a look at the workboard and look for tasks that could use your help.

View Workboard

Unresolved error reports, stacked by month.

Total open production error tasks, by month.

Month-over-month plots based on spreadsheet data.

Outstanding errors

Summary over recent months:

Jan 2020 (1 of 7 left)⚠️ Unchanged (over one year old).
Mar 2020 (2 of 2 left)⚠️ Unchanged (over one year old).
Apr 2020 (4 of 14 left)⚠️ Unchanged (over one year old).
May 2020 (5 of 14 left)⚠️ Unchanged (over one year old).
Jun 2020 (5 of 14 left)⚠️ Unchanged (over one year old).
Jul 2020 (4 of 24 issues)⚠️ Unchanged (over one year old).
Aug 2020 (11 of 53 issues)⬇️ One task resolved.-1
Sep 2020 (7 of 33 issues)⚠️ Unchanged (over one year old).
Oct 2020 (19 of 69 issues)⚠️ Unchanged (over one year old).
Nov 2020 (8 of 38 issues)⚠️ Unchanged (over one year old).
Dec 2020 (7 of 33 issues)⚠️ Unchanged (over one year old).
Jan 2021 (3 of 50 issues)⚠️ Unchanged (over one year old).
Feb 2021 (6 of 20 issues)⬇️ One task resolved.-1
Mar 2021 (13 of 48 issues)⬇️ One task resolved.-1
Apr 2021 (19 of 42 issues)⬇️ Four tasks resolved.-4
May 2021 (25 of 54 issues)⬇️ Four tasks resolved.-4
June 2021 (15 of 26 issues)📌 26 new issues, of which 11 were closed.+26, -11

150issues open, as of Excellence #32 (May 2021).
-11issues closed, of the previous 150 open issues.
+15new issues that survived June 2021.
154issues open as of yesterday.


Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

🕳 O'Neill: We've done this!
Dr Jackson: We do this every day.
O'Neill: I'm not talking about briefings in general, Daniel, I'm talking about this briefing; I'm talking about this day.
Teal'c: Col. O'Neill is correct. Events do appear to be repeating themselves.

Written by Krinkle on Wed, Jul 14, 3:34 AM.
Principal Engineer (Performance)
"Barnstar" token, awarded by thcipriani.

Event Timeline