Page MenuHomePhabricator

Invariant failed: Bad UTF-8 at end of string (2 byte sequence)
Closed, ResolvedPublic

Description

message
Invariant failed: Bad UTF-8 at end of string (2 byte sequence)

There are many instances of this assertion failure in logstash, but didn't check if they all have the same stack trace.
There is T236866: Invariant failed: Bad UTF-8 at start of string for start of string assertion failures

Details

Request ID
XcBKpgpAMFcAAG@rtkMAAADN
Request URL
/w/rest.php/vi.wikipedia.org/v3/page/pagebundle/Wikipedia%3ABi%E1%BB%83u_quy%E1%BA%BFt_%C4%91%C3%A1nh_gi%C3%A1_b%C3%A0i_vi%E1%BA%BFt_d%E1%BB%8Bch_thu%E1%BA%ADt/56190196
Stack Trace
exception.trace
#0 /srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Utils/PHPUtils.php(232): Wikimedia\Assert\Assert::invariant(boolean, string)
#1 /srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/PP/Processors/WrapSections.php(33): Parsoid\Utils\PHPUtils::safeSubstr(string, integer, integer)
#2 /srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/PP/Processors/WrapSections.php(382): Parsoid\Wt2Html\PP\Processors\WrapSections->getSrc(Parsoid\Wt2Html\PageConfigFrame, integer, integer)
#3 /srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/PP/Processors/WrapSections.php(447): Parsoid\Wt2Html\PP\Processors\WrapSections->resolveTplExtSectionConflicts(array)
#4 /srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/DOMPostProcessor.php(151): Parsoid\Wt2Html\PP\Processors\WrapSections->run(DOMElement, Parsoid\Config\Env, array, boolean)
#5 /srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/DOMPostProcessor.php(831): Parsoid\Wt2Html\DOMPostProcessor->Parsoid\Wt2Html\{closure}(DOMElement, Parsoid\Config\Env, array, boolean)
#6 /srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/DOMPostProcessor.php(882): Parsoid\Wt2Html\DOMPostProcessor->doPostProcess(DOMDocument)
#7 /srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/DOMPostProcessor.php(899): Parsoid\Wt2Html\DOMPostProcessor->process(DOMDocument)
#8 /srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/ParserPipeline.php(148): Parsoid\Wt2Html\DOMPostProcessor->processChunkily(string, array)
#9 /srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/ParserPipeline.php(198): Parsoid\Wt2Html\ParserPipeline->parseChunkily(string, array)
#10 /srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Wt2Html/ParserPipelineFactory.php(308): Parsoid\Wt2Html\ParserPipeline->parseToplevelDoc(string, array)
#11 /srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/WikitextContentModelHandler.php(78): Parsoid\Wt2Html\ParserPipelineFactory->parse(string)
#12 /srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Parsoid.php(86): Parsoid\WikitextContentModelHandler->toHTML(Parsoid\Config\Env)
#13 /srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/src/Parsoid.php(113): Parsoid\Parsoid->parseWikitext(MWParsoid\Config\PageConfig, array)
#14 /srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/extension/src/Rest/Handler/ParsoidHandler.php(543): Parsoid\Parsoid->wikitext2html(MWParsoid\Config\PageConfig, array, NULL)
#15 /srv/deployment/parsoid/deploy-cache/revs/a69ec92e21cc4be117daaadef4a8fc5bf5813fcf/src/extension/src/Rest/Handler/PageHandler.php(55): MWParsoid\Rest\Handler\ParsoidHandler->wt2html(Parsoid\Config\Env, array)
#16 /includes/Rest/Router.php(315): MWParsoid\Rest\Handler\PageHandler->execute()
#17 /includes/Rest/Router.php(285): MediaWiki\Rest\Router->executeHandler(MWParsoid\Rest\Handler\PageHandler)
#18 /includes/Rest/EntryPoint.php(116): MediaWiki\Rest\Router->execute(MediaWiki\Rest\RequestFromGlobals)
#19 /includes/Rest/EntryPoint.php(83): MediaWiki\Rest\EntryPoint->execute()
#20 /rest.php(31): MediaWiki\Rest\EntryPoint::main()
#21 /srv/mediawiki/w/rest.php(3): require(string)
#22 {main}
Related Gerrit Patches:

Event Timeline

ssastry created this task.Nov 4 2019, 9:00 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 4 2019, 9:00 PM
ssastry triaged this task as Medium priority.Nov 4 2019, 9:01 PM
ssastry assigned this task to cscott.Nov 4 2019, 9:04 PM
ssastry updated the task description. (Show Details)
ssastry moved this task from Backlog to Bugs, Notices, Crashers on the Parsoid-PHP board.

Can't reproduce with the latest version of that page or with revision id 56190196:

$ php bin/parse.php --pageName 'Wikipedia:Biểu_quyết_đánh_giá_bài_viết_dịch_thuật' --domain vi.wikipedia.org < /dev/null
$ tools/fetch-wt.js --domain vi.wikipedia.org 56190196 > T237318.wt
$ php bin/parse.php --pageName 'Wikipedia:Biểu_quyết_đánh_giá_bài_viết_dịch_thuật' --domain vi.wikipedia.org < T237318.wt

Perhaps a problem with a template, which has since been updated? Proper template time travel would really be useful.

There are DSR inconsistency warnings emitted. No PHP/JS diffs:

$ bin/parse.js --pageName 'Wikipedia:Biểu_quyết_đánh_giá_bài_viết_dịch_thuật' --domain vi.wikipedia.org --useBatchAPI < T237318.wt > js.out
$ php bin/parse.php --pageName 'Wikipedia:Biểu_quyết_đánh_giá_bài_viết_dịch_thuật' --domain vi.wikipedia.org --offsetType ucs2 < T237318.wt > php.out
$ node
> d = require('./bin/diff.html');
{ htmlDiff: [Function: htmlDiff],
  fileDiff: [Function: fileDiff],
  displayResult: [Function: displayResult] }
> diffs = d.fileDiff('js.out', 'php.out');
[]
> d.displayResult(diffs)
<unknown>:<unknown>: NO HTML DIFFS FOUND!

(using I61c3e9dbb380dab8447bee8ad758a763a7f12447 ).

Reproducible with I02c7fdd34d6808497a6aebf2d35e7073f49a0286:

$ php bin/parse.php --restURL "/w/rest.php/uk.wikipedia.org/v3/page/pagebundle/%D0%9C%D0%BE%D0%BD%D1%83%D0%BC%D0%B5%D0%BD%D1%82/26794587"
Wikimedia\Assert\InvariantException from line 217 of /home/cananian/Projects/Wikimedia/Parsoid/vendor/wikimedia/assert/src/Assert.php: Invariant failed: Bad UTF-8 at end of string (2 byte sequence)
#0 /home/cananian/Projects/Wikimedia/Parsoid/src/Utils/PHPUtils.php(238): Wikimedia\Assert\Assert::invariant(false, 'Bad UTF-8 at en...')
#1 /home/cananian/Projects/Wikimedia/Parsoid/src/Tokens/SourceRange.php(82): Parsoid\Utils\PHPUtils::safeSubstr(': {{otheruses|\xD0...', 410, 5)
#2 /home/cananian/Projects/Wikimedia/Parsoid/src/Tokens/Token.php(254): Parsoid\Tokens\SourceRange->substr(': {{otheruses|\xD0...')
#3 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/TT/Sanitizer.php(1292): Parsoid\Tokens\Token->getWTSource(Object(Parsoid\Wt2Html\PageConfigFrame))
#4 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/TT/Sanitizer.php(1514): Parsoid\Wt2Html\TT\Sanitizer::sanitizeToken(Object(Parsoid\Config\Env), Object(Parsoid\Tokens\TagTk), false)
#5 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/TT/TokenHandler.php(239): Parsoid\Wt2Html\TT\Sanitizer->onAny(Object(Parsoid\Tokens\TagTk))
#6 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/TokenTransformManager.php(120): Parsoid\Wt2Html\TT\TokenHandler->process(Array)
#7 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/TokenTransformManager.php(182): Parsoid\Wt2Html\TokenTransformManager->processChunk(Array)
#8 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/ParserPipeline.php(127): Parsoid\Wt2Html\TokenTransformManager->process(Array, Array)
#9 /home/cananian/Projects/Wikimedia/Parsoid/src/Utils/PipelineUtils.php(110): Parsoid\Wt2Html\ParserPipeline->parse(Array, Array)
#10 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/TT/DOMFragmentBuilder.php(99): Parsoid\Utils\PipelineUtils::processContentInPipeline(Object(Parsoid\Config\Env), Object(Parsoid\Wt2Html\Frame), Array, Array)
#11 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/TT/DOMFragmentBuilder.php(122): Parsoid\Wt2Html\TT\DOMFragmentBuilder->buildDOMFragment(Object(Parsoid\Tokens\SelfclosingTagTk))
#12 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/TT/TokenHandler.php(211): Parsoid\Wt2Html\TT\DOMFragmentBuilder->onTag(Object(Parsoid\Tokens\SelfclosingTagTk))
#13 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/TokenTransformManager.php(120): Parsoid\Wt2Html\TT\TokenHandler->process(Array)
#14 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/TokenTransformManager.php(182): Parsoid\Wt2Html\TokenTransformManager->processChunk(Array)
#15 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/ParserPipeline.php(127): Parsoid\Wt2Html\TokenTransformManager->process(Array, Array)
#16 /home/cananian/Projects/Wikimedia/Parsoid/src/Utils/PipelineUtils.php(110): Parsoid\Wt2Html\ParserPipeline->parse('[[\xD0\xA4\xD0\xB0\xD0\xB9\xD0\xBB:\xD0\x9F\xD0\xB0...', Array)
#17 /home/cananian/Projects/Wikimedia/Parsoid/src/Config/ParsoidExtensionAPI.php(158): Parsoid\Utils\PipelineUtils::processContentInPipeline(Object(Parsoid\Config\Env), Object(Parsoid\Wt2Html\Frame), '[[\xD0\xA4\xD0\xB0\xD0\xB9\xD0\xBB:\xD0\x9F\xD0\xB0...', Array)
#18 /home/cananian/Projects/Wikimedia/Parsoid/src/Ext/Gallery/Gallery.php(181): Parsoid\Config\ParsoidExtensionAPI->parseWikitextToDOM('[[\xD0\xA4\xD0\xB0\xD0\xB9\xD0\xBB:\xD0\x9F\xD0\xB0...', Array, true)
#19 /home/cananian/Projects/Wikimedia/Parsoid/src/Ext/Gallery/Gallery.php(276): Parsoid\Ext\Gallery\Gallery::pLine(Object(Parsoid\Config\ParsoidExtensionAPI), '\xD0\xA4\xD0\xB0\xD0\xB9\xD0\xBB:\xD0\x9F\xD0\xB0\xD0\xBC...', 11753, Object(Parsoid\Ext\Gallery\Opts))
#20 [internal function]: Parsoid\Ext\Gallery\Gallery->Parsoid\Ext\Gallery\{closure}(Array)
#21 /home/cananian/Projects/Wikimedia/Parsoid/src/Ext/Gallery/Gallery.php(274): array_map(Object(Closure), Array)
#22 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/TT/ExtensionHandler.php(132): Parsoid\Ext\Gallery\Gallery->toDOM(Object(Parsoid\Config\ParsoidExtensionAPI), '\n\xD0\xA4\xD0\xB0\xD0\xB9\xD0\xBB:Denkm...', Array)
#23 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/TT/ExtensionHandler.php(258): Parsoid\Wt2Html\TT\ExtensionHandler->onExtension(Object(Parsoid\Tokens\SelfclosingTagTk))
#24 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/TT/TokenHandler.php(211): Parsoid\Wt2Html\TT\ExtensionHandler->onTag(Object(Parsoid\Tokens\SelfclosingTagTk))
#25 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/TokenTransformManager.php(120): Parsoid\Wt2Html\TT\TokenHandler->process(Array)
#26 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/TokenTransformManager.php(192): Parsoid\Wt2Html\TokenTransformManager->processChunk(Array)
#27 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/TokenTransformManager.php(190): Parsoid\Wt2Html\TokenTransformManager->processChunkily(': {{otheruses|\xD0...', Array)
#28 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/HTML5TreeBuilder.php(430): Parsoid\Wt2Html\TokenTransformManager->processChunkily(': {{otheruses|\xD0...', Array)
#29 [internal function]: Parsoid\Wt2Html\HTML5TreeBuilder->processChunkily(': {{otheruses|\xD0...', Array)
#30 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/DOMPostProcessor.php(894): Generator->current()
#31 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/ParserPipeline.php(148): Parsoid\Wt2Html\DOMPostProcessor->processChunkily(': {{otheruses|\xD0...', Array)
#32 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/ParserPipeline.php(198): Parsoid\Wt2Html\ParserPipeline->parseChunkily(': {{otheruses|\xD0...', Array)
#33 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/ParserPipelineFactory.php(299): Parsoid\Wt2Html\ParserPipeline->parseToplevelDoc(': {{otheruses|\xD0...', Array)
#34 /home/cananian/Projects/Wikimedia/Parsoid/src/WikitextContentModelHandler.php(78): Parsoid\Wt2Html\ParserPipelineFactory->parse(': {{otheruses|\xD0...')
#35 /home/cananian/Projects/Wikimedia/Parsoid/src/Parsoid.php(93): Parsoid\WikitextContentModelHandler->toHTML(Object(Parsoid\Config\Env))
#36 /home/cananian/Projects/Wikimedia/Parsoid/src/Parsoid.php(123): Parsoid\Parsoid->parseWikitext(Object(Parsoid\Config\Api\PageConfig), Array)
#37 /home/cananian/Projects/Wikimedia/Parsoid/bin/parse.php(255): Parsoid\Parsoid->wikitext2html(Object(Parsoid\Config\Api\PageConfig), Array)
#38 /home/cananian/Projects/Wikimedia/Parsoid/bin/parse.php(460): Parse->wt2Html(Array, Array, NULL)
#39 /home/cananian/Projects/Wikimedia/Parsoid/tools/doMaintenance.php(53): Parse->execute()
#40 /home/cananian/Projects/Wikimedia/Parsoid/bin/parse.php(476): require_once('/home/cananian/...')
#41 {main}

Change 559197 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Ensure Sanitizer::sanitizeToken uses correct frame source text

https://gerrit.wikimedia.org/r/559197

Change 559197 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Ensure Sanitizer::sanitizeToken uses correct frame source text

https://gerrit.wikimedia.org/r/559197

Arlolra closed this task as Resolved.Wed, Jan 8, 10:04 PM