How’d we do in our strive for operational excellence in November and December? Read on to find out!
📊 Month in numbers
- 0 documented incidents in November, 5 incidents in December. [1]
- 17 new Wikimedia-prod-error reports. [2]
- 23 Wikimedia-prod-error reports closed. [3]
- 190 currently open Wikimedia-prod-error reports in total. [4]
November had zero reported incidents. Prior to this, the last month with no documented incidents was December 2017. To read about past incidents and unresolved actionables; check Incident documentation § 2019.
Explore Wikimedia incident graphs (interactive)
📖 Many dots, do not a query make!
@dcausse investigated a flood of exceptions from SpecialSearch, which reported “Cannot consume query at offset 0 (need to go to 7296)”. This exception served as a safeguard in the parser for search queries. The code path was not meant to be reached. The root cause was narrowed down to the following regex:
/\G(?<negated>[-!](?=[\w]))?(?<word>(?:\\\\.|[!-](?!")|[^"!\pZ\pC-])+)/u
This regex looks complex, but it can actually be simplified to:
/(?:ab|c)+/
This regex still triggers the problematic behavior in PHP. It fails with a PREG_JIT_STACKLIMIT_ERROR, when given a long string. Below is a reduced test case:
$ret = preg_match( '/(?:ab|c)+/', str_repeat( 'c', 8192 ) ); if ( $ret === false ) { print( "failed with: " . preg_last_error() ); }
- Fails when given 1365 contiguous c on PHP 7.0.
- Fails with 2731 characters on PHP 7.2, PHP 7.1, and PHP 7.0.13.
- Fails with 8192 characters on PHP 7.3. (Might be due to php-src@bb2f1a6).
In the end, the fix we applied was to split the regex into two separate ones, and remove the non-capturing group with a quantifier, and loop through at the PHP level (Gerrit change 546209).
The lesson learned here is that the code did not properly check the return value of preg_match, this is even more important as the size allowed for the JIT stack changes between PHP versions.
For future reference, @dcausse concluded: The regex could be optimized to support more chars (~3 times more) by using atomic groups, like so /(?>ab|c)+/. — T236419
📉 Outstanding reports
Take a look at the workboard and look for tasks that might need your help. The workboard lists error reports, grouped by the month in which they were first observed.
→ https://phabricator.wikimedia.org/tag/wikimedia-production-error/
Or help someone that’s already started with their patch:
→ Open prod-error tasks with a Patch-For-Review
Breakdown of recent months (past two weeks not included):
- March: 3 of 10 reports left. (unchanged). ⚠️
- April: Three reports closed, 6 of 14 left.
- May: (All clear!)
- June: Three reports closed. 6 of 11 left (unchanged). ⚠️
- July: One report closed, 12 of 18 left.
- August: Two reports closed, 4 of 14 left.
- September: One report closed, with 9 of 12 left.
- October: Four reports closed, 8 of 12 left.
- November: 5 new reports survived the month of November.
- December: 9 new reports survived the month of December.
🎉 Thanks!
Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production.
Until next time,
– Timo Tijhof
Footnotes:
[1] Incidents. – wikitech.wikimedia.org/wiki/Incident_documentation#2019
[2] Tasks created. – phabricator.wikimedia.org/maniphest/query…
[3] Tasks closed. – phabricator.wikimedia.org/maniphest/query…
[4] Open tasks. – phabricator.wikimedia.org/maniphest/query…
- Projects
- None
- Subscribers
- None
- Tokens