Page MenuHomePhabricator

InvalidArgumentException: Invalid language code "<long hex string>"
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   InvalidArgumentException: Invalid language code "d870695a1e61c1207dc5a9ba1b93d85dcef9859e2bfaee1cd2fd773bcc9bfbde1e67ef186c66b29244b96ffa3700826f416a1bf67a71122741e8fccb4ab84c965d1e6f8386a3b127039a0f46d316224e"
exception.trace
from /srv/mediawiki/php-1.43.0-wmf.21/includes/language/LanguageFactory.php(184)
#0 /srv/mediawiki/php-1.43.0-wmf.21/includes/language/LanguageFactory.php(170): MediaWiki\Languages\LanguageFactory->newFromCode(string)
#1 /srv/mediawiki/php-1.43.0-wmf.21/includes/libs/MapCacheLRU.php(271): MediaWiki\Languages\LanguageFactory->MediaWiki\Languages\{closure}()
#2 /srv/mediawiki/php-1.43.0-wmf.21/includes/language/LanguageFactory.php(171): MapCacheLRU->getWithSetCallback(string, Closure)
#3 /srv/mediawiki/php-1.43.0-wmf.21/includes/language/LanguageFactory.php(152): MediaWiki\Languages\LanguageFactory->getRawLanguage(string)
#4 /srv/mediawiki/php-1.43.0-wmf.21/includes/Message/Message.php(893): MediaWiki\Languages\LanguageFactory->getLanguage(string)
#5 /srv/mediawiki/php-1.43.0-wmf.21/includes/ResourceLoader/MessageBlobStore.php(234): MediaWiki\Message\Message->inLanguage(string)
#6 /srv/mediawiki/php-1.43.0-wmf.21/includes/ResourceLoader/MessageBlobStore.php(257): MediaWiki\ResourceLoader\MessageBlobStore->fetchMessage(string, string)
#7 /srv/mediawiki/php-1.43.0-wmf.21/includes/ResourceLoader/MessageBlobStore.php(174): MediaWiki\ResourceLoader\MessageBlobStore->generateMessageBlob(MediaWiki\ResourceLoader\FileModule, string)
#8 /srv/mediawiki/php-1.43.0-wmf.21/includes/ResourceLoader/MessageBlobStore.php(121): MediaWiki\ResourceLoader\MessageBlobStore->recacheMessageBlob(string, MediaWiki\ResourceLoader\FileModule, string)
#9 /srv/mediawiki/php-1.43.0-wmf.21/includes/ResourceLoader/MessageBlobStore.php(89): MediaWiki\ResourceLoader\MessageBlobStore->getBlobs(array, string)
#10 /srv/mediawiki/php-1.43.0-wmf.21/includes/ResourceLoader/Module.php(695): MediaWiki\ResourceLoader\MessageBlobStore->getBlob(MediaWiki\ResourceLoader\FileModule, string)
#11 /srv/mediawiki/php-1.43.0-wmf.21/includes/ResourceLoader/FileModule.php(649): MediaWiki\ResourceLoader\Module->getMessageBlob(MediaWiki\ResourceLoader\Context)
#12 /srv/mediawiki/php-1.43.0-wmf.21/includes/ResourceLoader/Module.php(958): MediaWiki\ResourceLoader\FileModule->getDefinitionSummary(MediaWiki\ResourceLoader\Context)
#13 /srv/mediawiki/php-1.43.0-wmf.21/includes/ResourceLoader/StartUpModule.php(221): MediaWiki\ResourceLoader\Module->getVersionHash(MediaWiki\ResourceLoader\Context)
#14 /srv/mediawiki/php-1.43.0-wmf.21/includes/ResourceLoader/StartUpModule.php(430): MediaWiki\ResourceLoader\StartUpModule->getModuleRegistrations(MediaWiki\ResourceLoader\Context)
#15 /srv/mediawiki/php-1.43.0-wmf.21/includes/ResourceLoader/Module.php(843): MediaWiki\ResourceLoader\StartUpModule->getScript(MediaWiki\ResourceLoader\Context)
#16 /srv/mediawiki/php-1.43.0-wmf.21/includes/ResourceLoader/Module.php(812): MediaWiki\ResourceLoader\Module->buildContent(MediaWiki\ResourceLoader\Context)
#17 /srv/mediawiki/php-1.43.0-wmf.21/includes/ResourceLoader/Module.php(955): MediaWiki\ResourceLoader\Module->getModuleContent(MediaWiki\ResourceLoader\Context)
#18 /srv/mediawiki/php-1.43.0-wmf.21/includes/ResourceLoader/ResourceLoader.php(686): MediaWiki\ResourceLoader\Module->getVersionHash(MediaWiki\ResourceLoader\Context)
#19 /srv/mediawiki/php-1.43.0-wmf.21/includes/ResourceLoader/ResourceLoader.php(786): MediaWiki\ResourceLoader\ResourceLoader->getCombinedVersion(MediaWiki\ResourceLoader\Context, array)
#20 /srv/mediawiki/php-1.43.0-wmf.21/includes/ResourceLoader/ResourceLoaderEntryPoint.php(54): MediaWiki\ResourceLoader\ResourceLoader->respond(MediaWiki\ResourceLoader\Context)
#21 /srv/mediawiki/php-1.43.0-wmf.21/includes/MediaWikiEntryPoint.php(200): MediaWiki\ResourceLoader\ResourceLoaderEntryPoint->execute()
#22 /srv/mediawiki/php-1.43.0-wmf.21/load.php(42): MediaWiki\MediaWikiEntryPoint->run()
#23 /srv/mediawiki/w/load.php(3): require(string)
#24 {main}
Impact

A batch of 10,407 of these showed up in a ~2 minute period, with a series of different hex codes for the language. Looks like each code was tried 280 times, resulting in a unique error string, rendering logspam-watch useless for monitoring as these errors consumed the display, masking lower frequency errors.

https://logstash.wikimedia.org/goto/d0b37dd20d98343526a3b469700ce69c

Details

Request URL
https://he.wikibooks.org/w/load.php?lang=*&modules=*&only=*&raw=*&skin=*

Event Timeline

LanguageNameUtils::isValidBuiltInCode() is badly named. It does not actually do what its name says and ResourceLoader relies on it too much to prevent issues like this, cf. T64849

Change #1071036 had a related patch set uploaded (by Ammarpad; author: Ammarpad):

[mediawiki/core@master] language: make isValidBuiltInCode() more robust

https://gerrit.wikimedia.org/r/1071036

Change #1071036 merged by jenkins-bot:

[mediawiki/core@master] language: make isValidBuiltInCode() more robust

https://gerrit.wikimedia.org/r/1071036

Krinkle closed this task as Resolved.EditedOct 16 2024, 7:19 PM
Krinkle claimed this task.
Krinkle subscribed.

I think the above actually fixed it, unlike what we previously thought.

When I revert the above, I can reproduce the original issue locally, which happens as follows:

  • query param contains a a-z string longer than 128 chars (e.g. 130x aaaaa...). This passed isValidBuiltInCode which validates with /^[a-z0-9-]{2,}$/. Notice the lack of max length, prior to the above patch.
  • later on, calls go to Message->inLanguage(str) -> LanguageFactory->newFromCode(str)
  • LanguageFactory->newFromCode checks isValidCode which is more tolerant in some ways (it allows various special chars outside a-z0-9- range, as long as they don't aren't invalid as wiki page title, look like path traversel, or look like HTML), except for the 128 length constraint.
  • Hence passing a 129-length a-z string throws an InvalidArgumentException.

I'd say the way this is meant to work is that isValidBuiltInCode should be a strict subset of isValidCode. With the above path in place, this is now the case. Valid codes include both built-in codes and various other fake lang codes we use in the ecosystem via the int-uselang hack.

I tried numerous ways, but I can't formulate any kind of lang= value that would cause this exception. And this isn't just a coincidence. I.e. There is isn't some dangerous code path that happens to be unreachable waiting to surprise us in the future. The right thing to do, before calling inLanguage/newFromCode is to validate it as either any lang code, or built-in lang code, and ResourceLoader has done that correclty for years. It's just that isValidBuiltInCode wasn't strict enough.

Values like aaaa-foo-xxxx, which are not known language codes (but are formatted within the realm of valid built-in codes), these are accepted by newFromCode and naturally result in a fallback to en when they don't exist.