Page MenuHomePhabricator

InvariantException: Invariant failed: Bad UTF-8 at start of string
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error

message
InvariantException:
Invariant failed: Bad UTF-8 at start of string

Impact

Pagebundle data unavailable due to an internal problem with a bad UTF-8 string.

Notes

There are a bunch of other requests on zhwiki, etc where we get this error as well.

Details

Request ID
XXeF3QpAIDAAAKpv-esAAABJ
Request URL
/w/rest.php/fr.wikipedia.org/v3/page/pagebundle/Mod%C3%A8le%3AHi%C3%A9rarchie_fin_n%C3%A9n%C3%A8tse/139698506
Stack Trace
exception.trace
#0 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Utils/PHPUtils.php(210): Wikimedia\Assert\Assert::invariant(boolean, string)
#1 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/PP/Processors/WrapSections.php(33): Parsoid\Utils\PHPUtils::safeSubstr(string, integer, integer)
#2 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/PP/Processors/WrapSections.php(381): Parsoid\Wt2Html\PP\Processors\WrapSections->getSrc(Parsoid\Wt2Html\PageConfigFrame, integer, integer)
#3 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/PP/Processors/WrapSections.php(447): Parsoid\Wt2Html\PP\Processors\WrapSections->resolveTplExtSectionConflicts(array)
#4 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/DOMPostProcessor.php(151): Parsoid\Wt2Html\PP\Processors\WrapSections->run(DOMElement, Parsoid\Config\Env, array, boolean)
#5 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/DOMPostProcessor.php(831): Parsoid\Wt2Html\DOMPostProcessor->Parsoid\Wt2Html\{closure}(DOMElement, Parsoid\Config\Env, array, boolean)
#6 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/DOMPostProcessor.php(882): Parsoid\Wt2Html\DOMPostProcessor->doPostProcess(DOMDocument)
#7 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/DOMPostProcessor.php(899): Parsoid\Wt2Html\DOMPostProcessor->process(DOMDocument)
#8 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/ParserPipeline.php(148): Parsoid\Wt2Html\DOMPostProcessor->processChunkily(string, array)
#9 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/ParserPipeline.php(198): Parsoid\Wt2Html\ParserPipeline->parseChunkily(string, array)
#10 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Wt2Html/ParserPipelineFactory.php(308): Parsoid\Wt2Html\ParserPipeline->parseToplevelDoc(string, array)
#11 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/WikitextContentModelHandler.php(78): Parsoid\Wt2Html\ParserPipelineFactory->parse(string)
#12 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Parsoid.php(86): Parsoid\WikitextContentModelHandler->toHTML(Parsoid\Config\Env)
#13 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Parsoid.php(113): Parsoid\Parsoid->parseWikitext(MWParsoid\Config\PageConfig, array)
#14 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/extension/src/Rest/Handler/ParsoidHandler.php(591): Parsoid\Parsoid->wikitext2html(MWParsoid\Config\PageConfig, array, NULL)
#15 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/extension/src/Rest/Handler/PageHandler.php(47): MWParsoid\Rest\Handler\ParsoidHandler->wt2html(Parsoid\Config\Env, array)
#16 /includes/Rest/Router.php(315): MWParsoid\Rest\Handler\PageHandler->execute()
#17 /includes/Rest/Router.php(285): MediaWiki\Rest\Router->executeHandler(MWParsoid\Rest\Handler\PageHandler)
#18 /includes/Rest/EntryPoint.php(116): MediaWiki\Rest\Router->execute(MediaWiki\Rest\RequestFromGlobals)
#19 /includes/Rest/EntryPoint.php(83): MediaWiki\Rest\EntryPoint->execute()
#20 /rest.php(31): MediaWiki\Rest\EntryPoint::main()
#21 /srv/mediawiki/w/rest.php(3): require(string)
#22 {main}

Event Timeline

ssastry created this task.Oct 30 2019, 3:15 AM
Restricted Application added subscribers: Cosine02, Aklapper. · View Herald TranscriptOct 30 2019, 3:15 AM
ssastry triaged this task as Medium priority.Oct 30 2019, 3:15 AM
ssastry moved this task from Backlog to Bugs, Notices, Crashers on the Parsoid-PHP board.

Hm, another one I can't seem to reproduce:

$ tools/fetch-wt.js --domain fr.wikipedia.org 139698506 > T236866.wt
cananian@skiffserv:~/Projects/Wikimedia/Parsoid$ php bin/parse.php --pageName 'Modèle:Hiérarchie fin nénètse' --domain fr.wikipedia.org --offsetType ucs2 < T236866.wt 

No problems. But this is another page which is mostly a template inclusion; highly likely the triggering template changed.

Aklapper edited projects, added Parsoid; removed Parsoid-PHP.Apr 10 2020, 4:27 PM
ssastry moved this task from Needs Triage to Bugs & Crashers on the Parsoid board.Apr 10 2020, 4:50 PM
brennen added a subscriber: brennen.

Noticing a scattering of these today while keeping an eye on post-deploy logs:

brennen@mwlog1001:~$ logspam | grep Invariant
17                 Invariant    wmf.28 v/w/a/s/Assert.php:224  Invariant failed: Bad UTF-8 at end of string (2 byte sequence)
2                  Invariant    wmf.28 v/w/a/s/Assert.php:224  Invariant failed: Bad UTF-8 at start of string
16                 Invariant    wmf.28 v/w/a/s/Assert.php:224  Invariant failed: Bad UTF-8 at end of string (3 byte sequence)
brennen@mwlog1001:/srv/mw-log$ grep -c 'Bad UTF-8 at.*of string' ./exception.log 
1922
brennen moved this task from Backlog to Logs/Train on the User-brennen board.Apr 28 2020, 11:42 PM
Krinkle renamed this task from Invariant failed: Bad UTF-8 at start of string to InvariantException: Invariant failed: Bad UTF-8 at start of string.Jul 22 2020, 7:21 PM
Krinkle updated the task description. (Show Details)
Krinkle updated the task description. (Show Details)Jul 22 2020, 7:35 PM

This is causing us a bunch of trouble with DiscussionTools on ko.wp (see the merged task), @cscott @ssastry would you be able to prioritize it?

Ok, will take a look. T240642: Production crashers in WrapSections code because of missing properties ($dsr, $pi, $parts) might be related to this -- i.e. both these are probably manifestations of the same problem but cause different crashers probably.

On the test page from T260180, I was able to reduce the failure to this following snippet on which I can reproduce the crasher locally on my laptop:

<!-- 내 -->
{{위키백과:사랑방 (기술)/{{Y-M|0}}}}

Removing that comment or changing that Korean character to an english ascii char eliminates the crasher. To be continued ...

Change 619873 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] WrapSections::getDSR: Don't assume all non-elements are text nodes!!

https://gerrit.wikimedia.org/r/619873

Change 619873 merged by jenkins-bot:
[mediawiki/services/parsoid@master] WrapSections::getDSR: Don't assume all non-elements are text nodes!!

https://gerrit.wikimedia.org/r/619873

Change 620755 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/vendor@master] Bump wikimedia/parsoid to v0.13.0-a6

https://gerrit.wikimedia.org/r/620755

Change 620755 merged by jenkins-bot:
[mediawiki/vendor@master] Bump wikimedia/parsoid to v0.13.0-a6

https://gerrit.wikimedia.org/r/620755

Change 621120 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] Add parser test for T236866

https://gerrit.wikimedia.org/r/621120

Change 621120 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Add parser test for T236866

https://gerrit.wikimedia.org/r/621120

ssastry closed this task as Resolved.Aug 21 2020, 3:35 AM
ssastry claimed this task.
ssastry added a project: Parsing-Active-Work.
ssastry added a subscriber: cscott.

Logstash shows these errors have driven down to zero since wmf.5. T237467 might also have been resolved with this fix. Will wait for a few days and verify in logstash.

Thank you for fixing this!

Logstash shows these errors have driven down to zero since wmf.5. T237467 might also have been resolved with this fix. Will wait for a few days and verify in logstash.

I checked myself out of curiosity, unfortunately it doesn't seem to be resolved.

"Bad UTF-8 at start of string"
"Bad UTF-8 (full string verification)"