Production Excellence #22: June 2020
Monthly update on our strive for operational excellence.

How’d we do in our strive for operational excellence last month? Read on to find out!

📈 Month in review
  • 4 documented incidents in June. [1]
  • 37 new production errors were filed and 27 were closed. [2] [3]
  • 72 recent production errors still open (up from 68).
  • 203 total Wikimedia-prod-error tasks currently open (up from 192). [4]

For more about recent incidents see Incident documentation, on Wikitech or Preventive measures in Phabricator.

📖 Outstanding errors

Breakdown of new errors reported in June that are still open today:

  1. (Needs owner) / Newsletter extension: Unexpected locking SELECT query. T253926
  2. (Needs owner) / FlaggedRevs extension: Unable to submit review of page due to bad fr_page_id record. T256296
  3. Editing team / MassMessage extension: Delivery fails due to system user conflict. T171003
  4. Parsing team / Parsoid: Pagebundle data unavailable due to a bad UTF-8 string. T236866
  5. Growth team / Recent changes: Update for ActiveUsers data failing due to deadlock. T255059
  6. Growth team / GrowthExperiments: Issue with question display on personal homepage. T255616
  7. Language team / Translate extension: Update jobs fail due to invalid function call. T255669
  8. Language team / ContentTranslation: Save action fails due to duplicate insert query. T256230
  9. Core Platform team / Content handling: Incompatible content type during content merge/stash. T255700
  10. Core Platform team / Monolog: API usage logs and error logs sometimes missing due to socket failure. T255578
  11. Search Platform team / WikibaseCirrus: Elevated error levels from EntitySearchElastic warnings. T255658
  12. Wikidata / API: Generator query fails due to invalid API result format. T254334
  13. Wikidata / API: EntityData query emits warning about bad RDF. T255054
  14. Wikidata / Repo: Entity relation update jobs fail due to deadlock. T255706

📊 Trends
Take a look at the workboard and look for tasks that could use your help.

Summary over recent months:

  • July 2019 (5 of 18 tasks left): Two tasks closed.
  • August (1 of 14 tasks left): Another task closed, only one remaining! 🚀
  • September (5 of 12 tasks left): Two tasks closed.
  • October (6 of 12 tasks left), no change.
  • November (3 of 5 tasks left): Another task closed.
  • December (5 of 9 tasks left), no change.
  • January 2020 (5 of 7 tasks lef), no change.
  • February (4 of 7 tasks left), no change.
  • March (2 of 2 tasks left), no change.
  • April (11 of 14 tasks left): Three tasks closed.
  • May (11 tasks left): Three tasks closed.
  • June: 14 new tasks survived the month of June. ⚠️

At the end of May the number of open production errors over recent months was 68. Of those, 10 got closed, but with 14 new tasks from June still open, the total has grown further to 72.

The workboard had 192 open tasks last month, which saw another increase, to now 203 open tasks (this includes tasks from 2019 and earlier).

🎉 Thanks!

Thank you to everyone else who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

ATC: “Do you want to report a UFO?” Pilot: “Negative. We don't want to report.”
   ATC: “Do you wish to file a report of any kind to us?” Pilot: “I wouldn't know what kind of report to file.”
  ATC: “Me neither…”

[1] Incidents. – https://wikitech.wikimedia.org/wiki/Incident_documentation#2020
[2] Tasks created. – https://phabricator.wikimedia.org/maniphest/query/VTpmvaJLYVL1/#R
[3] Tasks closed. – https://phabricator.wikimedia.org/maniphest/query/qn5yeURqyl3D/#R
[4] Open tasks. – https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R

Written by Krinkle on Jul 23 2020, 3:25 AM.
Principal Engineer (WMF Performance Team)
Networking, Jony

Event Timeline