Production Excellence #29: February 2021
Monthly update on our strive for operational excellence.

How’d we do in our strive for operational excellence last month? Read on to find out!

📈 Incidents

3 documented incidents last month, [1] which is average for the time of year. [2]

Learn about these incidents at Incident status on Wikitech, and their Preventive measures in Phabricator.

For those with NDA-restricted access, there may be additional private incident reports 🔒 available.

💡 Did you know: Our Incident reports have switched to using the ISO date format in their titles and listings, for improved readability and edit-ability (esp. when publishing on a later date). So long 202010221, and hello 2021-02-21!

📊 Trends

In February we saw a continuation of the new downward trend that began this January, which came after twelve months of continued rising. Let's make sure this trend sticks with us as we work our way through the debt, whilst also learning to have a healthy week-to-week iteration where we monitor and follow-up on any new developments such that they don't introduce lasting regressions.

The recent tally (issues filed since we started reporting in March 2019) is down to 138 unresolved errors, from 152 last month. The old backlog (pre-2019 issues) also continued its 5-month streak and is down to 148, from 160 last month. If this progress continues we'll soon have fewer "Old" issues than "Recent" issues, and possibly by the start of 2022 we may be able to report and focus only on our rotation through recent issues as hopefully we are then balancing our work such that issues reported this month are addressed mostly in the same month or otherwise later that quarter within 2-3 months. Visually that would manifest as the colored chunks having a short life on the chart with each drawn at a sharp downwards angle – instead of dragged out where it was building up an ever-taller shortcake. I do like cake, but I prefer the kind I can eat. 🍰

Month-over-month plots based on spreadsheet data. [3] [4]

Unresolved error reports stacked by recent month
Total open production error tasks, by month

📖 Outstanding errors

Summary over recent months:

  • ⚠️ July 2019 (2 of 18 issues left): no change.
  • ⚠️ August 2019 (1 of 14 issues): no change.
  • ⚠️ October 2019 (4 of 12 issues): no change.
  • ⚠️ November 2019 (1 of 5 issues): no change.
  • ⚠️ December 2019 (1 of 9 issues): One task resolved (-1).
  • ⚠️ January 2020 (2 of 7 issues): no change.
  • ⚠️ February 2020 (1 of 7 issues): no change.
  • ⚠️ March 2020 (2 of 2 issues): no change.
  • April 2020 (9 of 14 issues left): no change.
  • May 2020 (6 of 14 issues left): no change.
  • June 2020 (7 of 14 issues left): no change.
  • July 2020 (9 of 24 new issues): no change.
  • August 2020 (20 of 53 new issues): Two tasks resolved (-2).
  • September 2020 (9 of 33 new issues): Five tasks resolved (-5).
  • October 2020 (26 of 69 new issues): Five tasks resolved (-5).
  • November 2020 (11 of 38 new issues): Three tasks resolved (-3).
  • December 2020 (12 of 33 new issues): Seven tasks resolved (-7).
  • January 2021 (5 of 50 new issues): Two tasks resolved (-2).
  • February 2021: 11 of 20 new issues survived the month and remained unresolved (+20; -9)
Recent tally
152issues open, as of Excellence #28 (16 Feb 2021).
-25issues closed since, of the previous 152 open issues.
+11new issues that survived Feb 2021.
138issues open, as of today 5 Mar 2021.

For the on-going month of March 2021, we've got 12 new issues so far.

Take a look at the workboard and look for tasks that could use your help!

View Workboard


🎉 Thanks!

Thank you to everyone else who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof


Footnotes:

[1] Incident status Wikitech.
[2] Wikimedia incident stats by Krinkle, CodePen.
[3] Month-over-month, Production Excellence spreadsheet.
[4] Open tasks, Wikimedia-prod-error, Phabricator.

Written by Krinkle on Mar 6 2021, 1:03 AM.
Principal Engineer (Wikimedia Performance)
Projects
None
Subscribers
None

Event Timeline