Page MenuHomePhabricator

Wikimedia\Assert\InvariantException: Invariant failed: Bad UTF-8 at end of string (3 byte sequence)
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   Wikimedia\Assert\InvariantException: Invariant failed: Bad UTF-8 at end of string (3 byte sequence)
exception.trace
from /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/assert/src/Assert.php(224)
#0 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Utils/PHPUtils.php(218): Wikimedia\Assert\Assert::invariant(boolean, string)
#1 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Wt2Html/PP/Processors/WrapTemplates.php(963): Wikimedia\Parsoid\Utils\PHPUtils::safeSubstr(string, integer, integer)
#2 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Wt2Html/PP/Processors/WrapTemplates.php(1226): Wikimedia\Parsoid\Wt2Html\PP\Processors\WrapTemplates::encapsulateTemplates(DOMDocument, Wikimedia\Parsoid\Wt2Html\PageConfigFrame, array, array)
#3 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Wt2Html/PP/Processors/WrapTemplates.php(1239): Wikimedia\Parsoid\Wt2Html\PP\Processors\WrapTemplates::wrapTemplatesInTree(DOMDocument, Wikimedia\Parsoid\Wt2Html\PageConfigFrame, DOMElement)
#4 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Wt2Html/DOMPostProcessor.php(158): Wikimedia\Parsoid\Wt2Html\PP\Processors\WrapTemplates->run(Wikimedia\Parsoid\Config\Env, DOMElement, array, boolean)
#5 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Wt2Html/DOMPostProcessor.php(853): Wikimedia\Parsoid\Wt2Html\DOMPostProcessor->Wikimedia\Parsoid\Wt2Html\{closure}(DOMElement, array, boolean)
#6 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Wt2Html/DOMPostProcessor.php(903): Wikimedia\Parsoid\Wt2Html\DOMPostProcessor->doPostProcess(DOMElement)
#7 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Wt2Html/DOMPostProcessor.php(920): Wikimedia\Parsoid\Wt2Html\DOMPostProcessor->process(DOMElement)
#8 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Wt2Html/ParserPipeline.php(178): Wikimedia\Parsoid\Wt2Html\DOMPostProcessor->processChunkily(string, array)
#9 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Wt2Html/ParserPipelineFactory.php(307): Wikimedia\Parsoid\Wt2Html\ParserPipeline->parseChunkily(string, array)
#10 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Core/WikitextContentModelHandler.php(106): Wikimedia\Parsoid\Wt2Html\ParserPipelineFactory->parse(string)
#11 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Parsoid.php(162): Wikimedia\Parsoid\Core\WikitextContentModelHandler->toDOM(Wikimedia\Parsoid\Config\Env)
#12 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Parsoid.php(194): Wikimedia\Parsoid\Parsoid->parseWikitext(MWParsoid\Config\PageConfig, array)
#13 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/extension/src/Rest/Handler/ParsoidHandler.php(589): Wikimedia\Parsoid\Parsoid->wikitext2html(MWParsoid\Config\PageConfig, array, NULL)
#14 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/extension/src/Rest/Handler/PageHandler.php(88): MWParsoid\Rest\Handler\ParsoidHandler->wt2html(MWParsoid\Config\PageConfig, array)
#15 /srv/mediawiki/php-1.37.0-wmf.7/includes/Rest/Router.php(395): MWParsoid\Rest\Handler\PageHandler->execute()
#16 /srv/mediawiki/php-1.37.0-wmf.7/includes/Rest/Router.php(322): MediaWiki\Rest\Router->executeHandler(MWParsoid\Rest\Handler\PageHandler)
#17 /srv/mediawiki/php-1.37.0-wmf.7/includes/Rest/EntryPoint.php(165): MediaWiki\Rest\Router->execute(MediaWiki\Rest\RequestFromGlobals)
#18 /srv/mediawiki/php-1.37.0-wmf.7/includes/Rest/EntryPoint.php(130): MediaWiki\Rest\EntryPoint->execute()
#19 /srv/mediawiki/php-1.37.0-wmf.7/rest.php(31): MediaWiki\Rest\EntryPoint::main()
#20 /srv/mediawiki/w/rest.php(3): require(string)
#21 {main}
Impact
  • ~50 times in the last 30 min
Notes

Details

Request URL
https://ml.wikipedia.org/w/rest.php/ml.wikipedia.org/v3/page/pagebundle/%E0%B4%B5%E0%B4%BF%E0%B4%95%E0%B5%8D%E0%B4%95%E0%B4%BF%E0%B4%AA%E0%B5%80%E0%B4%A1%E0%B4%BF%E0%B4%AF%3A%E0%B4%B5%E0%B4%BF%E0%B4%95%E0%B5%8D%E0%B4%95%E0%B4%BF%E0%B4%B8%E0%B4%82%E0%B4%97%E0%B4%AE%E0%B5%8B%E0%B4%A4%E0%B5%8D%E0%B4%B8%E0%B4%B5%E0%B4%82_-_2013%2F%E0%B4%AA%E0%B4%99%E0%B5%8D%E0%B4%95%E0%B5%86%E0%B4%9F%E0%B5%81%E0%B4%95%E0%B5%8D%E0%B4%95%E0%B5%81%E0%B4%B5%E0%B4%BE%E0%B5%BB/1882310

Event Timeline

For the benefit of train engineers, I wanted to note that these UTF-8 errors seen in Parsoid are expected to come up in spurts (lots of errors on the same page OR repeated errors because of retries). So, unless there is a significant spike, this shouldn't be a concern for deployments.

Arlolra triaged this task as Medium priority.Jun 10 2021, 6:51 PM
Arlolra moved this task from Needs Triage to Bugs & Crashers on the Parsoid board.

@ssastry I don't know much about these errors, but it sounds like they have to do with bad user input. Are these generally solved by improving the parser to support more input, or also by categorically letting these fail in a different way? It seems like a categorical change may be appropiate here such that these kinds of issues more generally won't result in an HTTP 5xx and exception log entry, but instead e.g. to a Parsoid-specific log channel with an INFO or WARNING severity for your team to monitor. Possibly also a 4xx status response, but that's a different question.

Deployers and people outside the team have not, and imho should not, try to memorise internals of extensions and what kinds of fatals are "real" fatals. These should be fixed on the producer side instead. Usually that means fixing the root cause if it's easy, but if that's non-trivial or if it's a sustained category of erorrs like this one, then maybe a different mitigation could take place first to ensure they get recategorised. That might change nothing for end-users, but would mean Parsoid plays along better with how other MediaWiki components express themselves in Logstash.

These utf-8 errors are caused by a combination of things: either bad encoding stored in the db, "bad markup", or real bugs in Parsoid. These errors helped us find and fix most of these. The remaining ones are still some lingering ones we haven't investigated because there are far fewer of them now. But, yes I agree that it is probably time to migrate these remaining errors to a different log channel. We'll chat about this in the coming week.

Change 841598 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/services/parsoid@master] Ensure DSR computation is accurate if an unclosed comment is present

https://gerrit.wikimedia.org/r/841598

Change 841598 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Ensure DSR computation is accurate if an unclosed comment is present

https://gerrit.wikimedia.org/r/841598

Change 851140 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@master] Bump parsoid to 0.17.0-a5

https://gerrit.wikimedia.org/r/851140

Change 851140 merged by jenkins-bot:

[mediawiki/vendor@master] Bump parsoid to 0.17.0-a5

https://gerrit.wikimedia.org/r/851140

I am going to resolve this instance of the phab task. There are now instances that purport to be on main pages of ruwiki, bewiki, ukwiki .. which I'll look at separately but they may be parses for posted wikitext vs page wikitext. It is a bit hard to track those without logging the posted wikitext.

On my end, this looks better now. Thx!