Page MenuHomePhabricator

BUG: cannot consume query at offset 0 (need to go to 7296)
Closed, ResolvedPublic

Description

Error

MediaWiki version: 1.35.0-wmf.3

message
RuntimeException: BUG: cannot consume query at offset 0 (need to go to 7296)

Impact

Some user search queries result in application errors, which produce a generic system error page.
The response is an HTTP 500 Internal Server Error which cannot be cached.

Notes

Might be related:

Details

Request ID
XbHn5gpAADgAAI0wGooAAABO
Request URL
https://de.wiktionary.org/w/index.php?searchtitle=Spezial%3ASuche&go=Seite
Stack Trace
exception.trace
#0 /extensions/CirrusSearch/includes/Parser/QueryStringRegex/QueryStringRegexParser.php(342): CirrusSearch\Parser\QueryStringRegex\QueryStringRegexParser->nextToken()
#1 /extensions/CirrusSearch/includes/Parser/QueryStringRegex/QueryStringRegexParser.php(300): CirrusSearch\Parser\QueryStringRegex\QueryStringRegexParser->expression()
#2 /extensions/CirrusSearch/includes/Search/SearchQueryBuilder.php(119): CirrusSearch\Parser\QueryStringRegex\QueryStringRegexParser->parse(string)
#3 /extensions/CirrusSearch/includes/CirrusSearch.php(200): CirrusSearch\Search\SearchQueryBuilder::newFTSearchQueryBuilder(CirrusSearch\SearchConfig, string, class@anonymous
Related Gerrit Patches:
mediawiki/extensions/CirrusSearch : masterSplit word detection in multiple preg_match calls

Event Timeline

Krinkle created this task.Oct 24 2019, 6:16 PM
Restricted Application added a project: Discovery-Search. · View Herald TranscriptOct 24 2019, 6:16 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Krinkle updated the task description. (Show Details)
EBernhardson triaged this task as Medium priority.Oct 24 2019, 9:13 PM
EBernhardson moved this task from needs triage to Current work on the Discovery-Search board.

There are a number of other ways the same stack trace is achieved that can be found in Logstash. I've used a random one for the "Request URL" here but there are other triggers as well. They ones I spot-checked were all relatively long, so that's a likely cause.

dcausse claimed this task.Oct 25 2019, 7:41 AM

One regex is hitting PREG_JIT_STACKLIMIT_ERROR when running on this query with php 7.0 -> 7.2, php 7.3 seems to be allowing a bit more:

  • PHP7.0 and PHP7.2 on wmf servers seems to be able to consume up to ~2720 chars for this regexp
  • PHP 7.3.8-1 (on my machine) can consume up to 8190 chars

The regex is:

/\G(?<negated>[-!](?=[\w]))?(?<word>(?:\\\\.|[!-](?!")|[^"!\pZ\pC-])+)/u

The problem is this part:
(?<word>(?:\\\\.|[!-](?!")|[^"!\pZ\pC-])+)
A workaround that seems to help is forcing it to consume contiguous chars in the greediest section:
(?<word>(?:\\\\.|[!-](?!")|[^"!\pZ\pC-]+)+)

Note that the negative lookahead does not seem to affect anything. This might not fully resolve the issue as there are still ways to construct queries that could loop out of the greedy section [^"!\pZ\pC-] feeding the stack.
I'll detect properly the preg_match return value and log an error instead of failing the request, these queries seem pathological enough to just consume them as simple bag of words queries without breaking any important usecases.

Change 546209 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/CirrusSearch@master] Split word detection in multiple preg_match calls

https://gerrit.wikimedia.org/r/546209

I finally reworked how this works, optimizing the regex like that was not really possible as it broke escape sequences. I moved the complexity out of the regex and broke it into two parts.

Change 546209 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Split word detection in multiple preg_match calls

https://gerrit.wikimedia.org/r/546209

Gehel closed this task as Resolved.Oct 29 2019, 5:51 PM