Page MenuHomePhabricator

BUG: cannot consume query at offset 0 (need to go to 7296)
Closed, ResolvedPublicPRODUCTION ERROR



MediaWiki version: 1.35.0-wmf.3

RuntimeException: BUG: cannot consume query at offset 0 (need to go to 7296)


Some user search queries result in application errors, which produce a generic system error page.
The response is an HTTP 500 Internal Server Error which cannot be cached.


Might be related:


Request ID
Request URL
Stack Trace
#0 /extensions/CirrusSearch/includes/Parser/QueryStringRegex/QueryStringRegexParser.php(342): CirrusSearch\Parser\QueryStringRegex\QueryStringRegexParser->nextToken()
#1 /extensions/CirrusSearch/includes/Parser/QueryStringRegex/QueryStringRegexParser.php(300): CirrusSearch\Parser\QueryStringRegex\QueryStringRegexParser->expression()
#2 /extensions/CirrusSearch/includes/Search/SearchQueryBuilder.php(119): CirrusSearch\Parser\QueryStringRegex\QueryStringRegexParser->parse(string)
#3 /extensions/CirrusSearch/includes/CirrusSearch.php(200): CirrusSearch\Search\SearchQueryBuilder::newFTSearchQueryBuilder(CirrusSearch\SearchConfig, string, class@anonymous

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
EBernhardson moved this task from needs triage to Current work on the Discovery-Search board.

There are a number of other ways the same stack trace is achieved that can be found in Logstash. I've used a random one for the "Request URL" here but there are other triggers as well. They ones I spot-checked were all relatively long, so that's a likely cause.

One regex is hitting PREG_JIT_STACKLIMIT_ERROR when running on this query with php 7.0 -> 7.2, php 7.3 seems to be allowing a bit more:

  • PHP7.0 and PHP7.2 on wmf servers seems to be able to consume up to ~2720 chars for this regexp
  • PHP 7.3.8-1 (on my machine) can consume up to 8190 chars

The regex is:


The problem is this part:
A workaround that seems to help is forcing it to consume contiguous chars in the greediest section:

Note that the negative lookahead does not seem to affect anything. This might not fully resolve the issue as there are still ways to construct queries that could loop out of the greedy section [^"!\pZ\pC-] feeding the stack.
I'll detect properly the preg_match return value and log an error instead of failing the request, these queries seem pathological enough to just consume them as simple bag of words queries without breaking any important usecases.

Change 546209 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/CirrusSearch@master] Split word detection in multiple preg_match calls

I finally reworked how this works, optimizing the regex like that was not really possible as it broke escape sequences. I moved the complexity out of the regex and broke it into two parts.

Change 546209 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Split word detection in multiple preg_match calls