Page MenuHomePhabricator

ConverterRule: PHP Warning: Invalid argument supplied for foreach()
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
labels.normalized_message
[{reqId}] {exception_url}   PHP Warning: Invalid argument supplied for foreach()
error.stack_trace
from /srv/mediawiki/php-1.42.0-wmf.7/includes/language/ConverterRule.php(163)
#0 /srv/mediawiki/php-1.42.0-wmf.7/includes/language/ConverterRule.php(163): MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-1.42.0-wmf.7/includes/language/ConverterRule.php(390): ConverterRule->parseRules()
#2 /srv/mediawiki/php-1.42.0-wmf.7/includes/language/LanguageConverter.php(936): ConverterRule->parse(string)
#3 /srv/mediawiki/php-1.42.0-wmf.7/includes/language/LanguageConverter.php(869): LanguageConverter->recursiveConvertRule(string, string, integer, integer)
#4 /srv/mediawiki/php-1.42.0-wmf.7/includes/language/LanguageConverter.php(818): LanguageConverter->recursiveConvertTopLevel(string, string)
#5 /srv/mediawiki/php-1.42.0-wmf.7/includes/parser/Parsoid/LanguageVariantConverter.php(135): LanguageConverter->convertTo(string, string)
#6 /srv/mediawiki/php-1.42.0-wmf.7/includes/Rest/Handler/ParsoidHandler.php(1067): MediaWiki\Parser\Parsoid\LanguageVariantConverter->convertPageBundleVariant(Wikimedia\Parsoid\Core\PageBundle, Wikimedia\Bcp47Code\Bcp47CodeValue, NULL)
#7 /srv/mediawiki/php-1.42.0-wmf.7/includes/Rest/Handler/ParsoidHandler.php(938): MediaWiki\Rest\Handler\ParsoidHandler->languageConversion(MediaWiki\Parser\Parsoid\Config\PageConfig, array, array)
#8 /srv/mediawiki/php-1.42.0-wmf.7/includes/Rest/Handler/TransformHandler.php(155): MediaWiki\Rest\Handler\ParsoidHandler->pb2pb(array)
#9 /srv/mediawiki/php-1.42.0-wmf.7/includes/Rest/Router.php(536): MediaWiki\Rest\Handler\TransformHandler->execute()
#10 /srv/mediawiki/php-1.42.0-wmf.7/includes/Rest/Router.php(441): MediaWiki\Rest\Router->executeHandler(MWParsoid\Rest\Handler\TransformHandler)
#11 /srv/mediawiki/php-1.42.0-wmf.7/includes/Rest/EntryPoint.php(195): MediaWiki\Rest\Router->execute(MediaWiki\Rest\RequestFromGlobals)
#12 /srv/mediawiki/php-1.42.0-wmf.7/includes/Rest/EntryPoint.php(135): MediaWiki\Rest\EntryPoint->execute()
#13 /srv/mediawiki/php-1.42.0-wmf.7/rest.php(31): MediaWiki\Rest\EntryPoint::main()
#14 /srv/mediawiki/w/rest.php(3): require(string)
#15 {main}
Impact
Notes

Noticed 1 of these this morning during log triage. Hard to disentangle from other instances of the foreach() warning, but it's low frequency.

Details

Request URL
https://zh.wikipedia.org/w/rest.php/zh.wikipedia.org/v3/transform/pagebundle/to/pagebundle/User%3ACewbot%2Flog%2F20170515%2F%E5%AD%98%E6%AA%9410

Event Timeline

brennen renamed this task from PHP Warning: Invalid argument supplied for foreach() to ConverterRule: PHP Warning: Invalid argument supplied for foreach().Dec 14 2023, 4:50 PM
brennen moved this task from Backlog to Logs/Train on the User-brennen board.
brennen moved this task from Untriaged to Dec 2023 on the Wikimedia-production-error board.

The code in question is:

	private function parseRules() {
		$rules = $this->mRules;
		$bidtable = [];
		$unidtable = [];
		$varsep_pattern = $this->mConverter->getVarSeparatorPattern();

		// Split text according to $varsep_pattern, but ignore semicolons from HTML entities
		$rules = preg_replace( '/(&[#a-zA-Z0-9]+);/', "$1\x01", $rules );
		$choice = preg_split( $varsep_pattern, $rules );
		$choice = str_replace( "\x01", ';', $choice );

		foreach ( $choice as $c ) {

So if the preg_split fails it returns false. str_replace will then silently string-ify the false to "" and set $choice to "" instead of to an array. (Usually this code is using the arrays-in and arrays-out mode of str_replace.) This then triggers the failure in foreach.

preg_replace can also fail, returning null in this case. That would *also* get stringified in the call to preg_split, similarly becoming "", but then preg_split would work "correctly" on the empty string to return an array [ "" ] and everything after would be happy.

So we're looking for an input pattern that would cause preg_split to fail but *not* preg_replace. That probably rules out "bad utf-8" -- and in any case, neither of the patterns seems to have the u flag set. The pattern used for preg_split is constructed in LanguageConverter::getVarSeparatorPattern from a "bunch of stuff" but nothing looks content- or user-dependent. So it should either always fail or never fail. A low incidence of errors is a bit of a puzzle.

If the preg_split fails due to a bad regexp it should emit an E_WARNING according to the man page. I'm guessing we didn't see one of those for this request? That pretty much leaves "internal PCRE" failure as the only possible cause for preg_split to return false -- memory exhaustion, backtrack limit, something like that.

The regex involved is:

> $lf = MediaWiki\MediaWikiServices::getInstance()->getLanguageFactory();
= MediaWiki\Languages\LanguageFactory {#272}
> $lcf = MediaWiki\MediaWikiServices::getInstance()->getLanguageConverterFactory();
= MediaWiki\Languages\LanguageConverterFactory {#289}
> $c = $lcf->getLanguageConverter($lf->getLanguage('zh'))
= ZhConverter {#6442}
> $c->getVarSeparatorPattern()
= "/;\s*(?=zh\s*:|[^;]*?=>\s*zh\s*:|zh-hans\s*:|[^;]*?=>\s*zh-hans\s*:|zh-Hans\s*:|[^;]*?=>\s*zh-Hans\s*:|zh-hant\s*:|[^;]*?=>\s*zh-hant\s*:|zh-Hant\s*:|[^;]*?=>\s*zh-Hant\s*:|zh-cn\s*:|[^;]*?=>\s*zh-cn\s*:|zh-Hans-CN\s*:|[^;]*?=>\s*zh-Hans-CN\s*:|zh-hk\s*:|[^;]*?=>\s*zh-hk\s*:|zh-Hant-HK\s*:|[^;]*?=>\s*zh-Hant-HK\s*:|zh-mo\s*:|[^;]*?=>\s*zh-mo\s*:|zh-Hant-MO\s*:|[^;]*?=>\s*zh-Hant-MO\s*:|zh-my\s*:|[^;]*?=>\s*zh-my\s*:|zh-Hans-MY\s*:|[^;]*?=>\s*zh-Hans-MY\s*:|zh-sg\s*:|[^;]*?=>\s*zh-sg\s*:|zh-Hans-SG\s*:|[^;]*?=>\s*zh-Hans-SG\s*:|zh-tw\s*:|[^;]*?=>\s*zh-tw\s*:|zh-Hant-TW\s*:|[^;]*?=>\s*zh-Hant-TW\s*:|\s*$)/"