Production Excellence #32: May 2021
Monthly update on our strive for operational excellence.

How’d we do in our strive for operational excellence last month? Read on to find out!


Zero incidents recorded in the past month. Yay! That's only five months after November 2020, the last month without documented incidents (Incident stats).

Remember to review Preventive measures in Phabricator, which are action items filed after an incident.


In May, we unfortunately saw a repeat of the worrying pattern we saw in April, but with higher numbers. We found 54 new errors. This is the most new errors in a single month, since the Excellence monthly began three years ago in 2018. About half of these (29 of 54) remain unresolved as of writing, two weeks into the following month.

Unresolved error reports, stacked by month.

Total open production error tasks, by month.

Month-over-month plots based on spreadsheet data.

New errors in May

Below is a snapshot of just the 54 new issues found last month, listed by their code steward.

Be mindful that the reporting of errors is not itself a negative point per-se. I think it should be celebrated when teams have good telemetry, detect their issues early, and address them within their development cycle. It might be more worrisome when teams lack telemetry or time to find such issues, or can't keep up with the pace at which issues are found.

Anti Harassment ToolsNone.
Community TechNone.
Editing Team+2, -1Cite (T283755); OOUI (T282176).
Growth Team+17, -4Add-Link (T281960); GrowthExperiments (T281525 T281703 T283546 T283638 T283924); Echo (T282446); Recent-changes (T282047 T282726); StructuredDiscussions (T281521 T281523 T281782 T281784 T282069 T282146 T282599 T282605).
Language Team+1Translate extension (T283828).
Parsing Team+1Parsoid (T281932).
Reading WebNone.
Structured DataNone.
Product Infra Team+1WikimediaEvents (T282580).
Performance TeamNone.
Platform Engineering+16, -11MediaWiki-API (T282122); MediaWiki-General (T282173); MediaWiki-Page-derived-data (T281714 T281802 T282180 T283282), MediaWiki-Revision-backend (T282145 T282723 T282825 T283170); MediaWiki-User-management (T283167); MW Expedition (T281526 T281981 T282038 T282181 T283196).
Search Platform+3, -2CirrusSearch (T282036 T282207); GeoData (T282735).
WMDE TechWish+2, -1Revision-Slider (T282067); VisualEditor Template dialog (T283511).
WMDE Wikidata+3, -1Wikibase (T282534 T283198 T283862).
No owner+7, -6CentralAuth (T282834 T283635); Change-tagging (T283098 T283099); MapSources (T282833); MediaWiki-Page-information (T283751); Other (T283252).

Outstanding errors

Take a look at the workboard and look for tasks that could use your help.

View Workboard

Summary over recent months:

Aug 2019 (0 of 14 left)✅ Last task resolved!-1
Jan 2020 (1 of 7 left)⚠️ Unchanged (over one year old).
Mar 2020 (2 of 2 left)⚠️ Unchanged (over one year old).
Apr 2020 (4 of 14 left)⬇️ One task resolved.-1
May 2020 (5 of 14 left)⚠️ Unchanged (over one year old).
Jun 2020 (5 of 14 left)⚠️ Unchanged (over one year old).
Jul 2020 (4 of 24 issues)⏸ —
Aug 2020 (12 of 53 issues)⬇️ One task resolved.-1
Sep 2020 (7 of 33 issues)⏸ —
Oct 2020 (19 of 69 issues)⬇️ One task resolved.-1
Nov 2020 (8 of 38 issues)⬇️ One task resolved.-1
Dec 2020 (7 of 33 issues)⏸ —
Jan 2021 (3 of 50 issues)⏸ —
Feb 2021 (7 of 20 issues)⬇️ One task resolved.-1
Mar 2021 (14 of 48 issues)⬇️ Four tasks resolved.-4
Apr 2021 (23 of 42 issues)⬇️ Two tasks resolved.-2
May 2021 (29 of 54 issues)54 new issues found, of which 29 remain open.+54; -25
133issues open, as of Excellence #31 (12 May 2021).
-12issues closed, of the previous 133 open issues.
+29new issues that survived May 2021.
150issues open, as of today (12 June 2021).


Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

Incident status, Wikitech.
Wikimedia incident stats by Krinkle, CodePen.
Production error data (spreadsheet and plots).
Phabricator report charts for Wikimedia-production-error project.

Written by Krinkle on Jun 21 2021, 1:31 AM.
Principal Engineer (Wikimedia Performance)
"Love" token, awarded by mmodell.

Event Timeline

Thanks @Krinkle for doing this. I want to say that I really appreciate the work you put into this.