Production Excellence #31: April 2021
Monthly update on our strive for operational excellence.

How’d we do in our strive for operational excellence last month? Read on to find out!


6 documented incidents. That's above the historical average of 3–4 per month.

Learn about recent incidents at Incident status on Wikitech, or Preventive measures in Phabricator.


In April, we saw a continuation of the healthy trend that started this January — a trend where the back of the line is moving forward at least as quickly as the front of the line. We did take a little breather in March where we almost broke even, but otherwise the trend is going well.

Last month we bade farewell to the production errors we found in July 2019. This month we cleared out the column for October 2019.

One point of concern is that we did encounter a high number of new production errors — errors that we failed to catch during development, code review, continuous integration, beta testing, or pre-deployment checks. Where we used to discover about a dozen of those a month, we found 42 during this month. As of writing, 17 of the 42 April-discovered errors have been resolved.

The "Old" column (generally tracking pre-2019 tasks) grew for the first time in six months. This increase can largely be attributed to improved telemetry of client-side errors uncovering issues in under-resourced products, such as the old Kaltura video player.

Unresolved error reports, stacked by month.

Total open production error tasks, by month.

Month-over-month plots based on spreadsheet data.

Outstanding errors

View Workboard

Summary over recent months, per spreadsheet:

Aug 2019 (1 of 14 left)⚠️ Unchanged (over one year old).
Oct 2019 (0 of 12 left)✅ Last three tasks resolved!-3
Jan 2020 (1 of 7 left)⚠️ Unchanged (over one year old).
Mar 2020 (2 of 2 left)⚠️ Unchanged (over one year old).
Apr 2020 (5 of 14 left)⚠️ Unchanged (over one year old).
May 2020 (5 of 14 left)⏸ —
Jun 2020 (5 of 14 left)⬇️ One task resolved.-1
Jul 2020 (4 of 24 issues)⬇️ One task resolved.-1
Aug 2020 (13 of 53 issues)⬇️ Two tasks resolved.-2
Sep 2020 (7 of 33 issues)⏸ —
Oct 2020 (20 of 69 issues)⬇️ Two tasks resolved.-2
Nov 2020 (9 of 38 issues)⏸ —
Dec 2020 (7 of 33 issues)⬇️ Four tasks resolved.-4
Jan 2021 (3 of 50 issues)⬇️ One task resolved.-1
Feb 2021 (8 of 20 issues)⬇️ One task resolved.-1
Mar 2021 (18 of 48 issues)⬇️ Sixteen tasks resolved.-16
Apr 2021 (25 of 42 issues)42 new issues found, of which 25 remained open.+42; -17
139issues open, as of Excellence #30 (March 2021).
-31issues closed, of the previously open issues.
+25new issues that survived April 2021.
133issues open, as of today (12 May 2021).

Take a look at the workboard and look for tasks that could use your help:

View Workboard


Thank you to everyone who helped by reporting, investigating, or resolving problems in production!

Until next time,

– Timo Tijhof

🎥 McMurphy: That nurse, man... she, uh, she ain't honest.
Doctor: Ah now, look. Miss Ratched is one of the finest nurses we've got in this institution.
McMurphy: Ha! Well […] She likes a rigged game, know what I mean?

Written by Krinkle on May 13 2021, 3:49 AM.
Principal Engineer (Performance)
"Manufacturing Defect?" token, awarded by greg.

Event Timeline