Page MenuHomePhabricator

Wikimedia\RemexHtml\Tokenizer\TokenizerError: Wikimedia\RemexHtml\Tokenizer\Tokenizer: pcre.backtrack_limit exhausted
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
labels.normalized_message
[{reqId}] {exception_url}   Wikimedia\RemexHtml\Tokenizer\TokenizerError: Wikimedia\RemexHtml\Tokenizer\Tokenizer: pcre.backtrack_limit exhausted
error.stack_trace
from /srv/mediawiki/php-1.41.0-wmf.16/vendor/wikimedia/remex-html/src/Tokenizer/Tokenizer.php(1657)
#0 /srv/mediawiki/php-1.41.0-wmf.16/vendor/wikimedia/remex-html/src/Tokenizer/Tokenizer.php(529): Wikimedia\RemexHtml\Tokenizer\Tokenizer->throwPregError()
#1 /srv/mediawiki/php-1.41.0-wmf.16/vendor/wikimedia/remex-html/src/Tokenizer/Tokenizer.php(369): Wikimedia\RemexHtml\Tokenizer\Tokenizer->dataState(boolean)
#2 /srv/mediawiki/php-1.41.0-wmf.16/vendor/wikimedia/remex-html/src/Tokenizer/Tokenizer.php(178): Wikimedia\RemexHtml\Tokenizer\Tokenizer->executeInternal(boolean)
#3 /srv/mediawiki/php-1.41.0-wmf.16/vendor/wikimedia/parsoid/src/Utils/DOMUtils.php(61): Wikimedia\RemexHtml\Tokenizer\Tokenizer->execute(array)
#4 /srv/mediawiki/php-1.41.0-wmf.16/extensions/DiscussionTools/includes/Hooks/HookUtils.php(158): Wikimedia\Parsoid\Utils\DOMUtils::parseHTML(string)
#5 /srv/mediawiki/php-1.41.0-wmf.16/extensions/DiscussionTools/includes/ApiDiscussionToolsPageInfo.php(112): MediaWiki\Extension\DiscussionTools\Hooks\HookUtils::parseRevisionParsoidHtml(MediaWiki\Revision\RevisionStoreRecord, boolean)
#6 /srv/mediawiki/php-1.41.0-wmf.16/extensions/DiscussionTools/includes/ApiDiscussionToolsPageInfo.php(46): MediaWiki\Extension\DiscussionTools\ApiDiscussionToolsPageInfo->getThreadItemSet(array)
#7 /srv/mediawiki/php-1.41.0-wmf.16/includes/api/ApiMain.php(1915): MediaWiki\Extension\DiscussionTools\ApiDiscussionToolsPageInfo->execute()
#8 /srv/mediawiki/php-1.41.0-wmf.16/includes/api/ApiMain.php(892): ApiMain->executeAction()
#9 /srv/mediawiki/php-1.41.0-wmf.16/includes/api/ApiMain.php(863): ApiMain->executeActionWithErrorHandling()
#10 /srv/mediawiki/php-1.41.0-wmf.16/api.php(95): ApiMain->execute()
#11 /srv/mediawiki/php-1.41.0-wmf.16/api.php(48): wfApiMain()
#12 /srv/mediawiki/w/api.php(3): require(string)
#13 {main}
Impact
Notes

6 occurrences on various wikis, I have picked that trace since it is a GET request and thus has all the parameters included.

Another url https://sv.wikipedia.org/w/api.php?action=discussiontoolspageinfo&format=json&formatversion=2&oldid=53382841&page=Wikipedia%3AAns%C3%B6kan_om_administrativ_beh%C3%B6righet%2FTostarpadius&uselang=sv (trace.id 0df7d28a-43c4-402e-afa7-c4f05ffa5941).

They all come from DiscussionTools.

Details

Request URL
https://ru.wikipedia.org/w/api.php?action=discussiontoolspageinfo&format=*&formatversion=*&oldid=*&page=*&uselang=*
Related Changes in Gerrit:

Event Timeline

ooh, i would love to see the HTML involved in this one, since it looks like an "ordinary" call to ::parseHTML triggered it.

The comment in Remex is:

	 * - pcre.backtrack_limit exhausted: The backtrack limit should be at least
	 *   double the input size, the defaults are way too small. Increase it in
	 *   configuration.

The page in question seems to be:
https://ru.wikipedia.org/w/index.php?title=%D0%9E%D0%B1%D1%81%D1%83%D0%B6%D0%B4%D0%B5%D0%BD%D0%B8%D0%B5:%D0%9A%D0%B0%D0%B7%D0%B8%D0%BA%D1%83%D0%BC%D1%83%D1%85%D1%81%D0%BA%D0%BE%D0%B5_%D1%88%D0%B0%D0%BC%D1%85%D0%B0%D0%BB%D1%8C%D1%81%D1%82%D0%B2%D0%BE&oldid=130632402&useparsoid=1
which doesn't seem to be especially large, so I'm not sure this is 'just' an issue of increasing the backtrack limit.

This should be the HTML in question:


But I don't seem to have any trouble parsing it with DOMUtils::parseHTML() from the maintenance/run shell.php command line. Perhaps my CLI pcre backtrack limit is higher than the one set in production?

Hm, weird. Can't reproduce it locally even if I crank pcre.backtrack_limit all the way down to 9, although it does crash if pcre.backtrack_limit is 8 -- but the default PHP value is 1000000.

Hitting https://ru.wikipedia.org/w/api.php?action=discussiontoolspageinfo&format=json&formatversion=2&oldid=130632402&page=%D0%9E%D0%B1%D1%81%D1%83%D0%B6%D0%B4%D0%B5%D0%BD%D0%B8%D0%B5%3A%D0%9A%D0%B0%D0%B7%D0%B8%D0%BA%D1%83%D0%BC%D1%83%D1%85%D1%81%D0%BA%D0%BE%D0%B5_%D1%88%D0%B0%D0%BC%D1%85%D0%B0%D0%BB%D1%8C%D1%81%D1%82%D0%B2%D0%BE&uselang=ru (the source URL for the exception report) seems to execute fine w/o throwing an error.

Note that the error was thrown from a Kubernetes pod. We should confirm that php.ini is correct in Kubernetes. On bare metal hosts, it is set to 5000000.

There's a 44KB HTML comment, that might be related.

I was able to reproduce the error with

php7.4 -d pcre.backtrack_limit=100000 -d pcre.jit=0 maintenance/eval.php

ini_set('pcre.jit','0') seems to have no effect unless you do it before the first run of DOMUtils::parseHTML().

$ sudo mw-debug-repl -e enwiki
Finding a mw-debug pod in eqiad...
Now running eval.php for enwiki inside pod/mw-debug.eqiad.pinkunicorn-6bcc7484cf-6jmbw...
> print ini_get('pcre.backtrack_limit');
50000

I guess two zeroes went missing somewhere?

It's been this way since 2020?

Under no circumstances is 50000 an appropriate value for pcre.backtrack_limit. It is not a memory limit, it is just the number of characters scanned by the engine.

SettingOld valuek8s valueNotes
apc.shm_size6096M768MHopefully there's some sort of plan for detecting whether this is enough.
max_execution_time210180Wrong, must be 210 so that RequestTimeout can kill the request and show a pretty error message. RequestTimeout for POST requests has a limit of 200 seconds so 210 gives a 10 second grace period.
memory_limit500M200MDon't know how this is expected to work
opcache.interned_strings_buffer9650
opcache.max_accelerated_files2400032531How was this value derived?
opcache.max_wasted_percentage10unspecified (default 5)
opcache.memory_consumption1024500Fine
pcre.backtrack_limit500000050000We need our zeroes back

To give some context:

  • the pcre.backtrack_limit is clearly a typo, I dropped a zero and that should be fixed.
  • max_execution_time should indeed be fixed. It was changed on physical hardware after we made the initial copy and we forgot to copy over
  • opcache stuff is just optimized from looking at our actual usage on appservers, where we have memory to waste as we're cpu and concurrency-bound
  • APC size will need some tweaking as soon as we get enough traffic, but for now we don't see too many evictions, but rather a higher-than-normal level of fragmentation. And yes, it will be tweaked if needed. The number comes from reducing the value to account for a smaller number of requests on a single pod.
  • Memory limit is irrelevant as we set it in mediawiki-config anyways.
Clement_Goubert changed the task status from Open to In Progress.Aug 25 2023, 10:40 AM
Clement_Goubert claimed this task.
Clement_Goubert triaged this task as High priority.

Change 952349 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] mediawiki: Add missing controls for php-fpm

https://gerrit.wikimedia.org/r/952349

Change 952349 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki: Add missing controls for php-fpm

https://gerrit.wikimedia.org/r/952349

Mentioned in SAL (#wikimedia-operations) [2023-08-25T10:59:43Z] <claime> Deploying mediawiki: Add missing controls for php-fpm - T341320

Regarding the value "32531" opcache.max_accelerated_files, is that only numbers from a specific set of prime numbers are valid there. Using a different number, will result to the actual value used being the first number in that set, that is greater than or equal to what was configured.

@tstarling I've verified that the above requests work with the correct new configurations applied for pcre.backtrack_limit and max_execution_time after deploying to mw-debug and forcing the requests there through XWD. My checks for the ini values in debug are good as well.

I've now deployed the correct configuration for these two options to all mw-on-k8s deployments.