Error
InvariantException: Invariant failed: Bad UTF-8 at start of string
Impact
Pagebundle data unavailable due to an internal problem with a bad UTF-8 string.
Notes
There are a bunch of other requests on zhwiki, etc where we get this error as well.
| ssastry | |
| Oct 30 2019, 3:15 AM |
| F32194023: image.png | |
| Aug 24 2020, 9:58 PM |
| F32194025: image.png | |
| Aug 24 2020, 9:58 PM |
InvariantException: Invariant failed: Bad UTF-8 at start of string
Pagebundle data unavailable due to an internal problem with a bad UTF-8 string.
There are a bunch of other requests on zhwiki, etc where we get this error as well.
#0 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Utils/PHPUtils.php(210): Wikimedia\Assert\Assert::invariant(boolean, string)
#1 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/PP/Processors/WrapSections.php(33): Parsoid\Utils\PHPUtils::safeSubstr(string, integer, integer)
#2 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/PP/Processors/WrapSections.php(381): Parsoid\Wt2Html\PP\Processors\WrapSections->getSrc(Parsoid\Wt2Html\PageConfigFrame, integer, integer)
#3 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/PP/Processors/WrapSections.php(447): Parsoid\Wt2Html\PP\Processors\WrapSections->resolveTplExtSectionConflicts(array)
#4 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/DOMPostProcessor.php(151): Parsoid\Wt2Html\PP\Processors\WrapSections->run(DOMElement, Parsoid\Config\Env, array, boolean)
#5 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/DOMPostProcessor.php(831): Parsoid\Wt2Html\DOMPostProcessor->Parsoid\Wt2Html\{closure}(DOMElement, Parsoid\Config\Env, array, boolean)
#6 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/DOMPostProcessor.php(882): Parsoid\Wt2Html\DOMPostProcessor->doPostProcess(DOMDocument)
#7 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/DOMPostProcessor.php(899): Parsoid\Wt2Html\DOMPostProcessor->process(DOMDocument)
#8 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/ParserPipeline.php(148): Parsoid\Wt2Html\DOMPostProcessor->processChunkily(string, array)
#9 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/ParserPipeline.php(198): Parsoid\Wt2Html\ParserPipeline->parseChunkily(string, array)
#10 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/ParserPipelineFactory.php(308): Parsoid\Wt2Html\ParserPipeline->parseToplevelDoc(string, array)
#11 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/WikitextContentModelHandler.php(78): Parsoid\Wt2Html\ParserPipelineFactory->parse(string)
#12 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Parsoid.php(86): Parsoid\WikitextContentModelHandler->toHTML(Parsoid\Config\Env)
#13 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Parsoid.php(113): Parsoid\Parsoid->parseWikitext(MWParsoid\Config\PageConfig, array)
#14 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/extension/src/Rest/Handler/ParsoidHandler.php(591): Parsoid\Parsoid->wikitext2html(MWParsoid\Config\PageConfig, array, NULL)
#15 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/extension/src/Rest/Handler/PageHandler.php(47): MWParsoid\Rest\Handler\ParsoidHandler->wt2html(Parsoid\Config\Env, array)
#16 /includes/Rest/Router.php(315): MWParsoid\Rest\Handler\PageHandler->execute()
#17 /includes/Rest/Router.php(285): MediaWiki\Rest\Router->executeHandler(MWParsoid\Rest\Handler\PageHandler)
#18 /includes/Rest/EntryPoint.php(116): MediaWiki\Rest\Router->execute(MediaWiki\Rest\RequestFromGlobals)
#19 /includes/Rest/EntryPoint.php(83): MediaWiki\Rest\EntryPoint->execute()
#20 /rest.php(31): MediaWiki\Rest\EntryPoint::main()
#21 /srv/mediawiki/w/rest.php(3): require(string)
#22 {main}| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | ssastry | T229015 Tracking: Direct live production traffic at Parsoid/PHP | |||
| Resolved | PRODUCTION ERROR | ssastry | T236866 InvariantException: Invariant failed: Bad UTF-8 at start of string |
Hm, another one I can't seem to reproduce:
$ tools/fetch-wt.js --domain fr.wikipedia.org 139698506 > T236866.wt cananian@skiffserv:~/Projects/Wikimedia/Parsoid$ php bin/parse.php --pageName 'Modèle:Hiérarchie fin nénètse' --domain fr.wikipedia.org --offsetType ucs2 < T236866.wt
No problems. But this is another page which is mostly a template inclusion; highly likely the triggering template changed.
Noticing a scattering of these today while keeping an eye on post-deploy logs:
brennen@mwlog1001:~$ logspam | grep Invariant 17 Invariant wmf.28 v/w/a/s/Assert.php:224 Invariant failed: Bad UTF-8 at end of string (2 byte sequence) 2 Invariant wmf.28 v/w/a/s/Assert.php:224 Invariant failed: Bad UTF-8 at start of string 16 Invariant wmf.28 v/w/a/s/Assert.php:224 Invariant failed: Bad UTF-8 at end of string (3 byte sequence)
brennen@mwlog1001:/srv/mw-log$ grep -c 'Bad UTF-8 at.*of string' ./exception.log 1922
Ok, will take a look. T240642: Production crashers in WrapSections code because of missing properties ($dsr, $pi, $parts) might be related to this -- i.e. both these are probably manifestations of the same problem but cause different crashers probably.
On the test page from T260180, I was able to reduce the failure to this following snippet on which I can reproduce the crasher locally on my laptop:
<!-- 내 -->
{{위키백과:사랑방 (기술)/{{Y-M|0}}}}Removing that comment or changing that Korean character to an english ascii char eliminates the crasher. To be continued ...
Change 619873 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] WrapSections::getDSR: Don't assume all non-elements are text nodes!!
Change 619873 merged by jenkins-bot:
[mediawiki/services/parsoid@master] WrapSections::getDSR: Don't assume all non-elements are text nodes!!
Change 620755 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/vendor@master] Bump wikimedia/parsoid to v0.13.0-a6
Change 620755 merged by jenkins-bot:
[mediawiki/vendor@master] Bump wikimedia/parsoid to v0.13.0-a6
Change 621120 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] Add parser test for T236866
Change 621120 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Add parser test for T236866
Logstash shows these errors have driven down to zero since wmf.5. T237467 might also have been resolved with this fix. Will wait for a few days and verify in logstash.
Thank you for fixing this!
I checked myself out of curiosity, unfortunately it doesn't seem to be resolved.
| "Bad UTF-8 at start of string" | ✔ | |
| "Bad UTF-8 (full string verification)" | ❌ |