Page MenuHomePhabricator

Wikimedia\Assert\PostconditionException: Postcondition failed: Regex failed: 4
Open, Needs TriagePublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   Wikimedia\Assert\PostconditionException: Postcondition failed: Regex failed: 4
exception.trace
from /srv/mediawiki/php-1.41.0-wmf.13/vendor/wikimedia/assert/src/Assert.php(203)
#0 /srv/mediawiki/php-1.41.0-wmf.13/extensions/CirrusSearch/includes/Parser/QueryStringRegex/NonPhraseParser.php(99): Wikimedia\Assert\Assert::postcondition(boolean, string)
#1 /srv/mediawiki/php-1.41.0-wmf.13/extensions/CirrusSearch/includes/Parser/QueryStringRegex/QueryStringRegexParser.php(693): CirrusSearch\Parser\QueryStringRegex\NonPhraseParser->parse(string, integer, integer)
#2 /srv/mediawiki/php-1.41.0-wmf.13/extensions/CirrusSearch/includes/Parser/QueryStringRegex/QueryStringRegexParser.php(629): CirrusSearch\Parser\QueryStringRegex\QueryStringRegexParser->consumeWord(integer)
#3 /srv/mediawiki/php-1.41.0-wmf.13/extensions/CirrusSearch/includes/Parser/QueryStringRegex/QueryStringRegexParser.php(357): CirrusSearch\Parser\QueryStringRegex\QueryStringRegexParser->nextToken()
#4 /srv/mediawiki/php-1.41.0-wmf.13/extensions/CirrusSearch/includes/Parser/QueryStringRegex/QueryStringRegexParser.php(317): CirrusSearch\Parser\QueryStringRegex\QueryStringRegexParser->expression()
#5 /srv/mediawiki/php-1.41.0-wmf.13/extensions/CirrusSearch/includes/Search/SearchQueryBuilder.php(141): CirrusSearch\Parser\QueryStringRegex\QueryStringRegexParser->parse(string)
#6 /srv/mediawiki/php-1.41.0-wmf.13/extensions/CirrusSearch/includes/CirrusSearch.php(245): CirrusSearch\Search\SearchQueryBuilder::newFTSearchQueryBuilder(CirrusSearch\SearchConfig, string, class@anonymous
Notes
  • Superficially similar to T334681 stack trace is slightly different
  • Reproducible via GET request
  • Volume of errors is much higher obscuring legitimate errors than T334681 if it is unimportant, could we remove the error and return a message to users instead?

Details

Request URL
https://en.wikipedia.org/w/rest.php/v1/search/page?limit=*&q=*

Event Timeline

I think I found the input that causes the problem.

https://en.wikipedia.org/w/rest.php/v1/search/page?limit=20&q=%88

U+0088 = CHARACTER TABULATION SET

So, it's (probably) a weird character that gets parsed oddly somewhere internally and generates an error.

Update: It's not just that one character.. nearby characters are barfing too.

(Also, it's weird that non-regex errors sometimes generate regex error messages)

Wondering if it's not related to the lack of param sanitization/normalization of the REST api (T340185), the same query param does work OK with the opensearch or action API: https://en.wikipedia.org/w/api.php?action=opensearch&limit=10&search=%88

https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=%88 does warn with The value passed for \"srsearch\" contains invalid or non-normalized data. Textual data should be valid, NFC-normalized Unicode without C0 control characters other than HT (\\t), LF (\\n), and CR (\\r).

moving to blocked/waiting on the Discovery-Search board as I believe this problem is a direct consequence of T340185.

daniel added subscribers: FJoseph-WMF, daniel.

Ping @FJoseph-WMF: we need some kind of intake process for this kind of thing...

Gehel subscribed.

Removing Search Platform, since this is handled at API level