Page MenuHomePhabricator

Text w/o a context crasher in Parsoid/PHP LanguageConverter
Closed, ResolvedPublic

Description

Seen in kibana logs:

$ php bin/parse.php --restURL "/w/rest.php/zh.wikipedia.org/v3/transform/pagebundle/to/pagebundle/%E8%B6%85%E4%BA%BA%E5%8A%9B%E9%9C%B8%E7%8E%8B%E9%A6%AC%E5%85%8B%E6%96%AF" --htmlVariantLanguage zh-cn
[warn/dsr/inconsistent] DSR inconsistency: cs/s mismatch for node: tr s:1948 ; cs:2018
PHP Notice:  Undefined index: /html/body/table[3]/tbody/tr[6]/td[4]/span[4]/p in /home/cananian/Projects/Wikimedia/Parsoid/src/Language/MachineLanguageGuesser.php on line 94
Wikimedia\Assert\InvariantException from line 217 of /home/cananian/Projects/Wikimedia/Parsoid/vendor/wikimedia/assert/src/Assert.php: Invariant failed: Text w/o a context
#0 /home/cananian/Projects/Wikimedia/Parsoid/src/Language/ConversionTraverser.php(107): Wikimedia\Assert\Assert::invariant(false, 'Text w/o a cont...')
#1 /home/cananian/Projects/Wikimedia/Parsoid/src/Language/ConversionTraverser.php(66): Parsoid\Language\ConversionTraverser->textHandler(Object(DOMText), Object(Parsoid\Config\Env), true, Object(stdClass))
#2 [internal function]: Parsoid\Language\ConversionTraverser->Parsoid\Language\{closure}(Object(DOMText), Object(Parsoid\Config\Env), true, Object(stdClass))
#3 /home/cananian/Projects/Wikimedia/Parsoid/src/Utils/DOMTraverser.php(66): call_user_func(Object(Closure), Object(DOMText), Object(Parsoid\Config\Env), true, Object(stdClass))
#4 /home/cananian/Projects/Wikimedia/Parsoid/src/Utils/DOMTraverser.php(133): Parsoid\Utils\DOMTraverser->callHandlers(Object(DOMText), Object(Parsoid\Config\Env), true, Object(stdClass))
#5 /home/cananian/Projects/Wikimedia/Parsoid/src/Utils/DOMTraverser.php(144): Parsoid\Utils\DOMTraverser->traverse(Object(DOMText), Object(Parsoid\Config\Env), Array, true, Object(stdClass))
#6 /home/cananian/Projects/Wikimedia/Parsoid/src/Utils/DOMTraverser.php(144): Parsoid\Utils\DOMTraverser->traverse(Object(DOMElement), Object(Parsoid\Config\Env), Array, true, Object(stdClass))
#7 /home/cananian/Projects/Wikimedia/Parsoid/src/Utils/DOMTraverser.php(144): Parsoid\Utils\DOMTraverser->traverse(Object(DOMElement), Object(Parsoid\Config\Env), Array, true, Object(stdClass))
#8 /home/cananian/Projects/Wikimedia/Parsoid/src/Utils/DOMTraverser.php(144): Parsoid\Utils\DOMTraverser->traverse(Object(DOMElement), Object(Parsoid\Config\Env), Array, true, NULL)
#9 /home/cananian/Projects/Wikimedia/Parsoid/src/Utils/DOMTraverser.php(144): Parsoid\Utils\DOMTraverser->traverse(Object(DOMElement), Object(Parsoid\Config\Env), Array, true, NULL)
#10 /home/cananian/Projects/Wikimedia/Parsoid/src/Utils/DOMTraverser.php(144): Parsoid\Utils\DOMTraverser->traverse(Object(DOMElement), Object(Parsoid\Config\Env), Array, true, NULL)
#11 /home/cananian/Projects/Wikimedia/Parsoid/src/Utils/DOMTraverser.php(144): Parsoid\Utils\DOMTraverser->traverse(Object(DOMElement), Object(Parsoid\Config\Env), Array, true, NULL)
#12 /home/cananian/Projects/Wikimedia/Parsoid/src/Language/LanguageConverter.php(288): Parsoid\Utils\DOMTraverser->traverse(Object(DOMElement), Object(Parsoid\Config\Env), Array, true)
#13 /home/cananian/Projects/Wikimedia/Parsoid/src/Language/LanguageConverter.php(222): Parsoid\Language\LanguageConverter::baseToVariant(Object(Parsoid\Config\Env), Object(DOMElement), 'zh-cn', NULL)
#14 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/PP/Processors/LangConverter.php(22): Parsoid\Language\LanguageConverter::maybeConvert(Object(Parsoid\Config\Env), Object(DOMDocument), 'zh-cn', NULL)
#15 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/DOMPostProcessor.php(151): Parsoid\Wt2Html\PP\Processors\LangConverter->run(Object(DOMElement), Object(Parsoid\Config\Env), Array, true)
#16 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/DOMPostProcessor.php(828): Parsoid\Wt2Html\DOMPostProcessor->Parsoid\Wt2Html\{closure}(Object(DOMElement), Object(Parsoid\Config\Env), Array, true)
#17 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/DOMPostProcessor.php(881): Parsoid\Wt2Html\DOMPostProcessor->doPostProcess(Object(DOMDocument))
#18 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/DOMPostProcessor.php(898): Parsoid\Wt2Html\DOMPostProcessor->process(Object(DOMDocument))
#19 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/ParserPipeline.php(148): Parsoid\Wt2Html\DOMPostProcessor->processChunkily('{{Expand|time=2...', Array)
#20 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/ParserPipeline.php(198): Parsoid\Wt2Html\ParserPipeline->parseChunkily('{{Expand|time=2...', Array)
#21 /home/cananian/Projects/Wikimedia/Parsoid/src/Wt2Html/ParserPipelineFactory.php(299): Parsoid\Wt2Html\ParserPipeline->parseToplevelDoc('{{Expand|time=2...', Array)
#22 /home/cananian/Projects/Wikimedia/Parsoid/src/WikitextContentModelHandler.php(78): Parsoid\Wt2Html\ParserPipelineFactory->parse('{{Expand|time=2...')
#23 /home/cananian/Projects/Wikimedia/Parsoid/src/Parsoid.php(93): Parsoid\WikitextContentModelHandler->toHTML(Object(Parsoid\Config\Env))
#24 /home/cananian/Projects/Wikimedia/Parsoid/src/Parsoid.php(123): Parsoid\Parsoid->parseWikitext(Object(Parsoid\Config\Api\PageConfig), Array)
#25 /home/cananian/Projects/Wikimedia/Parsoid/bin/parse.php(255): Parsoid\Parsoid->wikitext2html(Object(Parsoid\Config\Api\PageConfig), Array)
#26 /home/cananian/Projects/Wikimedia/Parsoid/bin/parse.php(460): Parse->wt2Html(Array, Array, NULL)
#27 /home/cananian/Projects/Wikimedia/Parsoid/tools/doMaintenance.php(53): Parse->execute()
#28 /home/cananian/Projects/Wikimedia/Parsoid/bin/parse.php(476): require_once('/home/cananian/...')
#29 {main}

Event Timeline

Change 559493 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] WIP: Use DOMDataUtils::getNodeData in MachineLanguageGuesser

https://gerrit.wikimedia.org/r/559493

Change 559493 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Use DOMDataUtils::getNodeData in MachineLanguageGuesser

https://gerrit.wikimedia.org/r/559493

Reproducing:

# production
$ curl -X GET --header 'Accept-Language: zh-cn' 'https://zh.wikipedia.org/api/rest_v1/page/html/User:Cscott%2FT241146/57725518'
{"type":"https://mediawiki.org/wiki/HyperSwitch/errors/unknown_error","method":"get","uri":"/zh.wikipedia.org/v1/page/html/User%3ACscott%2FT241146/57725518"}
# beta, after ssh to deployment-parsoid09.deployment-prep.eqiad.wmflabs
$ curl -H'Accept-Language: zh-cn' -x deployment-mediawiki-parsoid10:80 http://zh.wikipedia.beta.wmflabs.org/w/rest.php/zh.wikipedia.beta.wmflabs.org/v3/page/html/User:Cscott%2FZhTest/12308
# locally
$ php bin/parse.php --restURL "/w/rest.php/zh.wikipedia.org/v3/transform/pagebundle/to/pagebundle/User:Cscott%2FT241146" --htmlVariantLanguage zh-cn