Page MenuHomePhabricator

AttributeExpander sets up bogus data-mw in some multi-template infobox scenarios
Closed, ResolvedPublic

Description

Reduced Test Case:

{|{{Infobox aircraft begin
}}{{Infobox aircraft type|number built=unknown, 300 ordered<ref>{{cite web|title=The Cannon Pioneers}}</ref>}}
|}

The following commandline on scandium causes the crasher. The "--pageName" argument is necessary.

php bin/parse.php  --pageName 'SPAD_S.XII' < /tmp/wt

See the comments in the body of this phab task for analysis. This bug is present in the JS code as well, just that these assertions aren't present there to be triggered. The pagename argument is what ensures that the expansion of the first template adds the [[Category:No local image but image on Wikidata]] wikitext at the bottom. The presence of fostered content exposes the bug by ensuring that the data-mw on the table representing the expanded attribute isn't replaced with information about the template wrapping. With fostered content, the fostered content gets the overall template wrapper and the table retains its bogus data-mw.


Request URLs:

  • /w/rest.php/en.wikipedia.org/v3/page/pagebundle/User%3ARlandmann%2FB-17_Flying_Fortress/887607268
  • /w/rest.php/en.wikipedia.org/v3/page/pagebundle/SPAD_S.XII/933888325
  • /w/rest.php/sl.wikipedia.org/v3/transform/wikitext/to/pagebundle/RMS_Olympic

Exception Trace:

#0 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Utils/DOMDataUtils.php(69): Wikimedia\Assert\Assert::invariant(boolean, string)
#1 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Utils/DOMDataUtils.php(91): Wikimedia\Parsoid\Utils\DOMDataUtils::getNodeData(DOMElement)
#2 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Wt2Html/PP/Handlers/CleanUp.php(164): Wikimedia\Parsoid\Utils\DOMDataUtils::getDataParsoid(DOMElement)
#3 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Wt2Html/DOMPostProcessor.php(471): Wikimedia\Parsoid\Wt2Html\PP\Handlers\CleanUp::cleanupAndSaveDataParsoid(array, DOMElement, Wikimedia\Parsoid\Config\Env, boolean, stdClass)
#4 [internal function]: Wikimedia\Parsoid\Wt2Html\DOMPostProcessor->Wikimedia\Parsoid\Wt2Html\{closure}(DOMElement, Wikimedia\Parsoid\Config\Env, array, boolean, stdClass)
#5 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Utils/DOMTraverser.php(77): call_user_func(Closure, DOMElement, Wikimedia\Parsoid\Config\Env, array, boolean, stdClass)
#6 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Utils/DOMTraverser.php(146): Wikimedia\Parsoid\Utils\DOMTraverser->callHandlers(DOMElement, Wikimedia\Parsoid\Config\Env, array, boolean, stdClass)
#7 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Utils/DOMTraverser.php(158): Wikimedia\Parsoid\Utils\DOMTraverser->traverse(DOMElement, Wikimedia\Parsoid\Config\Env, array, boolean, stdClass)
#8 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Utils/DOMTraverser.php(158): Wikimedia\Parsoid\Utils\DOMTraverser->traverse(DOMElement, Wikimedia\Parsoid\Config\Env, array, boolean, stdClass)
#9 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Utils/DOMTraverser.php(158): Wikimedia\Parsoid\Utils\DOMTraverser->traverse(DOMElement, Wikimedia\Parsoid\Config\Env, array, boolean, stdClass)
#10 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Utils/DOMTraverser.php(158): Wikimedia\Parsoid\Utils\DOMTraverser->traverse(DOMElement, Wikimedia\Parsoid\Config\Env, array, boolean, stdClass)
#11 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Utils/DOMTraverser.php(158): Wikimedia\Parsoid\Utils\DOMTraverser->traverse(DOMElement, Wikimedia\Parsoid\Config\Env, array, boolean, stdClass)
#12 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Utils/DOMTraverser.php(158): Wikimedia\Parsoid\Utils\DOMTraverser->traverse(DOMElement, Wikimedia\Parsoid\Config\Env, array, boolean, stdClass)
#13 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Utils/DOMTraverser.php(158): Wikimedia\Parsoid\Utils\DOMTraverser->traverse(DOMElement, Wikimedia\Parsoid\Config\Env, array, boolean, stdClass)
#14 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Utils/DOMTraverser.php(158): Wikimedia\Parsoid\Utils\DOMTraverser->traverse(DOMElement, Wikimedia\Parsoid\Config\Env, array, boolean, stdClass)
#15 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Utils/DOMTraverser.php(158): Wikimedia\Parsoid\Utils\DOMTraverser->traverse(DOMElement, Wikimedia\Parsoid\Config\Env, array, boolean, stdClass)
#16 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Utils/DOMTraverser.php(158): Wikimedia\Parsoid\Utils\DOMTraverser->traverse(DOMElement, Wikimedia\Parsoid\Config\Env, array, boolean, stdClass)
#17 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Utils/DOMTraverser.php(158): Wikimedia\Parsoid\Utils\DOMTraverser->traverse(DOMElement, Wikimedia\Parsoid\Config\Env, array, boolean, stdClass)
#18 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Utils/DOMTraverser.php(158): Wikimedia\Parsoid\Utils\DOMTraverser->traverse(DOMElement, Wikimedia\Parsoid\Config\Env, array, boolean, stdClass)
#19 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Utils/DOMTraverser.php(158): Wikimedia\Parsoid\Utils\DOMTraverser->traverse(DOMElement, Wikimedia\Parsoid\Config\Env, array, boolean, stdClass)
#20 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Utils/DOMTraverser.php(158): Wikimedia\Parsoid\Utils\DOMTraverser->traverse(DOMElement, Wikimedia\Parsoid\Config\Env, array, boolean, NULL)
#21 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Wt2Html/DOMPostProcessor.php(145): Wikimedia\Parsoid\Utils\DOMTraverser->traverse(DOMElement, Wikimedia\Parsoid\Config\Env, array, boolean, NULL)
#22 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Wt2Html/DOMPostProcessor.php(829): Wikimedia\Parsoid\Wt2Html\DOMPostProcessor->Wikimedia\Parsoid\Wt2Html\{closure}(DOMElement, Wikimedia\Parsoid\Config\Env, array, boolean)
#23 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Wt2Html/DOMPostProcessor.php(886): Wikimedia\Parsoid\Wt2Html\DOMPostProcessor->doPostProcess(DOMDocument)
#24 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Wt2Html/DOMPostProcessor.php(903): Wikimedia\Parsoid\Wt2Html\DOMPostProcessor->process(DOMDocument)
#25 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Wt2Html/ParserPipeline.php(148): Wikimedia\Parsoid\Wt2Html\DOMPostProcessor->processChunkily(string, array)
#26 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Wt2Html/ParserPipeline.php(198): Wikimedia\Parsoid\Wt2Html\ParserPipeline->parseChunkily(string, array)
#27 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Wt2Html/ParserPipelineFactory.php(290): Wikimedia\Parsoid\Wt2Html\ParserPipeline->parseToplevelDoc(string, array)
#28 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Core/WikitextContentModelHandler.php(78): Wikimedia\Parsoid\Wt2Html\ParserPipelineFactory->parse(string)
#29 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Parsoid.php(148): Wikimedia\Parsoid\Core\WikitextContentModelHandler->toDOM(Wikimedia\Parsoid\Config\Env)
#30 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/src/Parsoid.php(180): Wikimedia\Parsoid\Parsoid->parseWikitext(MWParsoid\Config\PageConfig, array)
#31 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/extension/src/Rest/Handler/ParsoidHandler.php(529): Wikimedia\Parsoid\Parsoid->wikitext2html(MWParsoid\Config\PageConfig, array, NULL)
#32 /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/parsoid/extension/src/Rest/Handler/PageHandler.php(66): MWParsoid\Rest\Handler\ParsoidHandler->wt2html(MWParsoid\Config\PageConfig, array)
#33 /srv/mediawiki/php-1.35.0-wmf.26/includes/Rest/Router.php(353): MWParsoid\Rest\Handler\PageHandler->execute()
#34 /srv/mediawiki/php-1.35.0-wmf.26/includes/Rest/Router.php(308): MediaWiki\Rest\Router->executeHandler(MWParsoid\Rest\Handler\PageHandler)
#35 /srv/mediawiki/php-1.35.0-wmf.26/includes/Rest/EntryPoint.php(138): MediaWiki\Rest\Router->execute(MediaWiki\Rest\RequestFromGlobals)
#36 /srv/mediawiki/php-1.35.0-wmf.26/includes/Rest/EntryPoint.php(105): MediaWiki\Rest\EntryPoint->execute()
#37 /srv/mediawiki/php-1.35.0-wmf.26/rest.php(31): MediaWiki\Rest\EntryPoint::main()
#38 /srv/mediawiki/w/rest.php(3): require(string)
#39 {main}

Event Timeline

ssastry created this task.Apr 8 2020, 5:49 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 8 2020, 5:49 PM
ssastry triaged this task as Medium priority.Apr 8 2020, 7:47 PM
ssastry moved this task from Needs Triage to Bugs & Crashers on the Parsoid board.

This looks like a shared data-parsoid object that didn't get cloned somewhere. I had to hot-patch debugging / tracing calls on scandium to see what is going on.

ssastry updated the task description. (Show Details)Apr 9 2020, 3:02 AM
Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptApr 9 2020, 3:02 AM

If you dump the DOM, you see that the HTML for the embedded data-mw looks pretty convoluted. But, I suspect the Parsoid Exension API code to inspect embedded HTML from Cite is causing the brokenness. There is some sharing going on instead of cloning. That is a bug in Cite / ParsoidExtensionAPI. But, that apart, the data-mw for the top-level table with mw:ExpandedAttrs type needs fixing as well. That whole code needs another look to see what we can do to simplify it.

Here is the HTML snippet that is causing problems.

<link rel="mw:PageProp/Category" href="./Category:No_local_image_but_image_on_Wikidata" about="#mwt2" typeof="mw:Transclusion" data-parsoid='{"stx":"simple","a":{"href":"./Category:No_local_image_but_image_on_Wikidata"},"sa":{"href":"Category:No local image but image on Wikidata"},"dsr":[0,140,null,null],"src":"{|{{Infobox aircraft begin\n}}{{Infobox aircraft type|number built=unknown, 300 ordered&lt;ref>{{cite web|title=The Cannon Pioneers}}&lt;/ref>}}\n|}","pi":[[],[{"k":"number built","named":true}]],"firstWikitextNode":"TABLE"}' data-mw='{"parts":["{|",{"template":{"target":{"wt":"Infobox aircraft begin\n","href":"./Template:Infobox_aircraft_begin"},"params":{},"i":0}},{"template":{"target":{"wt":"Infobox aircraft type","href":"./Template:Infobox_aircraft_type"},"params":{"number built":{"wt":"unknown, 300 ordered&lt;ref>{{cite web|title=The Cannon Pioneers}}&lt;/ref>"}},"i":1}},"\n|}"]}'/><table class="infobox" style="width:25.5em;border-spacing:2px;" about="#mwt2" typeof="mw:ExpandedAttrs" data-parsoid='{"tsr":[0,137],"dsr":[0,140,137,2]}' data-mw='{"attribs":[[{"txt":"class","html":"&lt;span about=\"#mwt2\" typeof=\"mw:Transclusion\" data-parsoid=&apos;{\"pi\":[[]],\"firstWikitextNode\":\"TABLE\",\"dsr\":[0,29,null,null]}&apos; data-mw=&apos;{\"parts\":[\"{|\",{\"template\":{\"target\":{\"wt\":\"Infobox aircraft begin\\n\",\"href\":\"./Template:Infobox_aircraft_begin\"},\"params\":{},\"i\":0}}]}&apos;>class=\"infobox\" style=\"width:25.5em;border-spacing:2px;\"\n! \n|-\n\n|-\n&lt;/span>&lt;link rel=\"mw:PageProp/Category\" href=\"./Category:No_local_image_but_image_on_Wikidata\" about=\"#mwt2\" data-parsoid=&apos;{\"stx\":\"simple\",\"a\":{\"href\":\"./Category:No_local_image_but_image_on_Wikidata\"},\"sa\":{\"href\":\"Category:No local image but image on Wikidata\"}}&apos;/>&lt;span about=\"#mwt3\" typeof=\"mw:Transclusion\" data-parsoid=&apos;{\"pi\":[[{\"k\":\"number built\",\"named\":true}]],\"dsr\":[29,137,null,null]}&apos; data-mw=&apos;{\"parts\":[{\"template\":{\"target\":{\"wt\":\"Infobox aircraft type\",\"href\":\"./Template:Infobox_aircraft_type\"},\"params\":{\"number built\":{\"wt\":\"unknown, 300 ordered&amp;lt;ref>{{cite web|title=The Cannon Pioneers}}&amp;lt;/ref>\"}},\"i\":0}}]}&apos;>\n\n! Role\n| {{{type}}}\n|-\n\n\n\n\n\n\n\n\n\n\n\n\n! &lt;/span>&lt;span class=\"nowrap\" about=\"#mwt3\" data-parsoid=&apos;{\"stx\":\"html\"}&apos;>Number built&lt;/span>&lt;span about=\"#mwt3\" data-parsoid=\"{}\">\n| unknown, 300 ordered&lt;/span>&lt;sup about=\"#mwt3\" class=\"mw-ref\" id=\"cite_ref-1\" rel=\"dc:references\" typeof=\"mw:Extension/ref\" data-parsoid=&apos;{\"src\":\"&amp;lt;ref>{{cite web|title=The Cannon Pioneers}}&amp;lt;/ref>\"}&apos; data-mw=&apos;{\"name\":\"ref\",\"attrs\":{},\"body\":{\"id\":\"mw-reference-text-cite_note-1\"}}&apos;>&lt;a href=\"./SPAD_S.XII#cite_note-1\" style=\"counter-reset: mw-Ref 1;\" data-parsoid=\"{}\">&lt;span class=\"mw-reflink-text\" data-parsoid=\"{}\">[1]&lt;/span>&lt;/a>&lt;/sup>&lt;span about=\"#mwt3\" data-parsoid=\"{}\">\n|-\n\n\n\n\n&lt;/span>"},{"html":""}]]}'>

See how the <link... is duplicated in the data-mw attributed of the <table .. tag with mw:ExpandedAttrs typeof. You will also see the mw:Extension/ref for the <ref> tag in the second infobox wikitext above. But, this is a bug. That second template should not have been part of the data-mw at all. See below for what the two templates expand to.

================================================================================
TEMPLATE:Template:Infobox_aircraft_begin; TRANSCLUSION:"{{Infobox aircraft begin\n}}"
--------------------------------------------------------------------------------
class="infobox" style="width:25.5em;border-spacing:2px;"
! colspan="2" style="text-align: center; font-size: large; padding-bottom: 0.3em;" |
|-
|-
[[Category:No local image but image on Wikidata]]
--------------------------------------------------------------------------------
================================================================================
TEMPLATE:Template:Infobox_aircraft_type; TRANSCLUSION:"{{Infobox aircraft type|number built=unknown, 300 ordered<ref>{{cite web|title=The Cannon Pioneers}}</ref>}}"
--------------------------------------------------------------------------------
! Role
| {{{type}}}
|-
! <span class="nowrap">Number built</span>
| unknown, 300 ordered<ref>{{cite web|title=The Cannon Pioneers}}</ref>
|-

So, only the first template is really contributing to the HTML attributes of the <table> tag and so that should have been the only component that should have been pulled into the data-mw. The second template is generate content, not attributes. By pulling in the content-producing template into data-mw, we are effectively duplicating the <ref> tag - once in the data-mw of the <table> and once in the body of the table. This duplicate with identical shared DOMs that are stored in the fragment maps in the $env object is the source of the downstream problem. When the fragment is processed for the first time, the data attributes for this fragment get one set of data-object-ids. When it is processed a second time, the same shared DOM fragment from the map is pulled up and the data attributes get a different set of data-object-ids (backed by the same data-parsoid object). So, during the cleanup pass, the first copy proceeds normally and the the data-parsoid object gets stored. But, when we hit the second copy, we hit this assertion.

Anyway, the real bug is in the AttributeExpander code. But, separately, we should consider if the 'getContentDOM' should actually be a 'retrieveContentDOM' (or something) that actually removes the fragment from the map. There should really be no reason for fragments to be shared. But, I don't know if there will be usecases in the future where extensions might decide to share fragments and do fancy things with them.

ssastry renamed this task from Invariant failed: Trying to fetch node data without loading! to AttributeExpander sets up bogus data-mw in some multi-template infobox scenarios.Apr 9 2020, 5:41 PM
ssastry updated the task description. (Show Details)

But, separately, we should consider if the 'getContentDOM' should actually be a 'retrieveContentDOM' (or something) that actually removes the fragment from the map. There should really be no reason for fragments to be shared.

So, if I hack the getDOMFragment call in ParsoidExtensionAPI to reset the dom fragment for the $id to an empty array, as expected, the parse crashes much earlier in the pipeline, alerting us to improper sharing of a DOM fragment.

ssastry@scandium:/srv/parsoid-testing$ sudo -u www-data php /srv/mediawiki/multiversion/MWScript.php /srv/parsoid-testing/bin/parse.php --wiki=enwiki --integrated --pageName 'SPAD_S.XII' --dump tplsrc < /tmp/wt
Wikimedia\Assert\InvariantException from line 224 of /srv/mediawiki/php-1.35.0-wmf.26/vendor/wikimedia/assert/src/Assert.php: Invariant failed: Expected an element
#0 /srv/parsoid-testing/src/Utils/DOMUtils.php(127): Wikimedia\Assert\Assert::invariant(false, 'Expected an ele...')
#1 /srv/parsoid-testing/src/Ext/ParsoidExtensionAPI.php(222): Wikimedia\Parsoid\Utils\DOMUtils::assertElt(NULL)
#2 /srv/parsoid-testing/src/Ext/Cite/References.php(115): Wikimedia\Parsoid\Ext\ParsoidExtensionAPI->getContentDOM('mwf2')
#3 /srv/parsoid-testing/src/Ext/Cite/References.php(373): Wikimedia\Parsoid\Ext\Cite\References::extractRefFromNode(Object(Wikimedia\Parsoid\Ext\ParsoidExtensionAPI), Object(DOMElement), Object(Wikimedia\Parsoid\Ext\Cite\ReferencesData))
....
ssastry updated the task description. (Show Details)Apr 9 2020, 9:08 PM
ssastry renamed this task from AttributeExpander sets up bogus data-mw in some multi-template infobox scenarios to AttributeExpander sets up bogus data-mw in multi-template infobox scenarios (where the first template that produces table styles also has content in fosterable position).Apr 9 2020, 9:12 PM
ssastry updated the task description. (Show Details)
ssastry updated the task description. (Show Details)
ssastry renamed this task from AttributeExpander sets up bogus data-mw in multi-template infobox scenarios (where the first template that produces table styles also has content in fosterable position) to AttributeExpander sets up bogus data-mw in multi-template infobox scenarios.Apr 10 2020, 1:33 AM
ssastry updated the task description. (Show Details)

Change 589157 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] WIP: Bug fix in complex mixed-attr-content multi-template scenarios

https://gerrit.wikimedia.org/r/589157

ssastry claimed this task.Apr 15 2020, 11:48 PM

Change 589157 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Bug fix in complex mixed-attr-content multi-template scenarios

https://gerrit.wikimedia.org/r/589157

Change 594256 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/vendor@master] Bump Parsoid to 0.12.0-a12

https://gerrit.wikimedia.org/r/594256

Change 594256 merged by jenkins-bot:
[mediawiki/vendor@master] Bump Parsoid to 0.12.0-a12

https://gerrit.wikimedia.org/r/594256

ssastry renamed this task from AttributeExpander sets up bogus data-mw in multi-template infobox scenarios to AttributeExpander sets up bogus data-mw in some multi-template infobox scenarios.May 4 2020, 10:50 PM
ssastry closed this task as Resolved.May 5 2020, 5:33 AM