Production Excellence: October 2018
Monthly update on our strive for operational excellence.

How’d we do in our strive for operational excellence last month? Read on to find out!

  • Month in numbers.
  • Highlighted stories.
  • Current problems.

📊 Month in numbers

  • 7 documented incident since from 24 September to 31 October. [1]
  • 79 Wikimedia-prod-error tasks closed from 24 September to 31 October. [2]
  • 69 Wikimedia-prod-error tasks created from 24 September to 31 October. [3]
  • 175 currently open Wikimedia-prod-error tasks (as of 25 November 2018).

October had a relatively high number of incidents – compared to prior months and compared to the same month last year (details).

Terminology:

  • An Exception (or fatal) causes user actions to be prevented. For example, a page would display "Exception: Unable to render page", instead the article content.
  • A Warning (or non-fatal, or error) can produce page views that are technically unaware of a problem, but may show corrupt, incorrect, or incomplete information. Examples – an article would display the code word “null” instead of the actual content, a user looking for Vegetables may be taken to an article about Vegetarians, a user may receive a notification that says “You have (null) new messages.”

I’ve highlighted a few of last month’s resolved tasks below.

*️⃣ Send your thanks for talk contributions

Fixed by volunteer @Mh-3110 (Mahuton).

The Thanks functionality for MediaWiki (created in 2013) wasn’t working in some cases. This problem was first reported in April, with four more reports since then. Mahuton investigated together with @SBisson. They found that the issue was specific to talk pages with structured discussions.

It turned out to be caused by an outdated array access key in SpecialThanks.php. Once adjusted, the functionality was restored to its former glory. The error existed for about eight months, since internal refactoring in March for T186920 changed the internal array.

This was Mahuton’s first Gerrit contribution. Thank you @Mh-3110, and welcome!

T191442 / https://gerrit.wikimedia.org/r/461189

*️⃣ One space led to Fatal exception

Fixed by volunteer @D3r1ck01 (Derick Alangi).

Administrators use the Special:DeletedContributions page to search for edits that are hidden from public view. When an admin typed a space at the end of their search, the MediaWiki application would throw a fatal exception. The user would see a generic error page, suggesting that the website may be unavailable.

Derick went in and updated the input handler to automatically correct these inputs for the user.

T187619

*️⃣ Fatal exception from translation draft access

Accessing the private link for ContentTranslation when logged-out isn’t meant to work. But, the code didn’t account for this fact. When users attempted to open such url when not logged in, the ContentTranslation code performed an invalid operation. This caused a fatal error from the MediaWiki application. The user would see a system error page without further details.

This could happen when opening the link from your bookmarks before logging in, or after restarting the browser, or after clearing one’s cookies.

Fixed by @santhosh (Santhosh Thottingal, WMF Language Engineering team).

T205433

🎉 Thanks!

Thank you to everyone who helped by reporting or investigating problems in Wikimedia production; and for devising, coding or reviewing the corrective measures. Including: @Addshore, @Aklapper, @Anomie, @ArielGlenn, @Catrope, @D3r1ck01, @Daimona, @Fomafix, @Ladsgroup, @Legoktm, @MSantos, @Mainframe98, @Melos, @Mh-3110, @SBisson, @Tgr, @Umherirrender, @Vort, @aaron, @aezell, @cscott, @dcausse, @jcrespo, @kostajh, @matmarex, @mmodell, @mobrovac, @santhosh, @thcipriani, and @thiemowmde.

📉 Current problems

Take a look at the workboard and look for tasks that might need your help. The workboard lists known issues, grouped by the week in which they were first observed.

https://phabricator.wikimedia.org/tag/wikimedia-production-error

💡 ProTip:

Cross-reference one workboard with another via Open TasksAdvanced Filter and enter Tag(s) to apply as a filter.

Thanks!

Until next time,
– Timo Tijhof


Footnotes:

[1] Incidents. – wikitech.wikimedia.org/wiki/Special:AllPages...
[2] Tasks closed. – phabricator.wikimedia.org/maniphest/query...
[3] Tasks opened. – phabricator.wikimedia.org/maniphest/query...

Written by Krinkle on Wed, Nov 28, 5:47 PM.
Principal Engineer (Performance)
Projects
None
Subscribers
Mholloway, Gehel, Aklapper and 29 others
Tokens
"Party Time" token, awarded by MSantos."Party Time" token, awarded by D3r1ck01.

Thanks to @Mholloway and @Gehel who helped me a lot.