How’d we do in our strive for operational excellence last month? Read on to find out!
3 documented incidents last month. That's at the median for the past twelve months, and slightly below the median of 4 over the past five years (Incident stats).
- 2021-07-14 eventgate latency spike
- Impact: For ~ 10min MediaWiki API clients experienced request failures.
- 2021-07-16 codfw-a2 network
- Impact: For ~ 1 hour Restbase clients received errors, affecting mobile apps and ContentTranslation.
- 2021-07-26 ruwikinews DynamicPageList
- Impact: For 30min, 15% of requests from contributors on all wikis failed. There were also brief moments during which no readers could load recently modified or uncached pages.
Learn about past incidents at Incident status on Wikitech. Remember to review and schedule Incident Follow-up in Phabricator, which are preventive measures and other action items filed after an incident.
Last month the workboard held 154 non-old unresolved error reports. Over the past thirty days, the collective efforts of our volunteers and engineering teams have closed 14 of those.
In the month of July we've also introduced or discovered thirty-one new error reports (that's an average of one production regression every day!). Of those new error reports, fifteen were resolved and 16 remain unresolved. The workboard now tallies up to 156 tasks.
Take a look at the workboard and look for tasks that could use your help.
Over on the backlog, we're continuing to ploddingly present progress on production problems from phantoms of christmases past.
For more month-over-month numbers refer to the spreadsheet data.
Below are various older issues that may have fallen by the wayside, taken from somewhat-random stab-in-the-dark queries.
Oldest unresolved errors that are still reproducible (Phab query):
- Reported in 2015: Unable to view history of protected Flow board (StructuredDiscussions, Growth team), T118502.
- Reported in 2016: Error when deleting a heading next to a table (VisualEditor, Editing team), T140871.
Stalled error reports (Phab query):
- Stalled Mar 2021: Constraints check for Q142 France times out (Wikidata, WMDE), T212282.
Oldest error with a patch for review (Phab query):
- Reported in 2016: Maps broken during 2nd live preview (Maps, Product Infra), T151524.
- Reported in 2018: Corrupt connection for cross-wiki db query (Platform team), T193565.
|Jan 2021 (3 of 50 issues left)||⚠️ Unchanged. Have a look-see!|
|Feb 2021 (6 of 20 issues left)||⚠️ Unchanged. Take a gander!|
|Mar 2021 (13 of 48 issues left)||⚠️ Unchanged. Check it out!|
|Apr 2021 (18 of 42 issues left)||-1|
|May 2021 (22 of 54 issues left)||-3|
|June 2021 (11 of 26 issues left)||-4|
|July 2021 (16 of 31 issues left)||+31; -15|
|154||issues open, as of Excellence #33 (June 2021).|
|-14||issues closed, of the previous 154 open issues.|
|+16||new issues that survived July 2021.|
|156||issues open, as of today.|
Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!
Until next time,
– Timo Tijhof