Currently Parsoid provides minimal information about the contents of a template. It gives comprehensive information about the name of the template and the parameters passed to it, but there's nothing about what Parsoid generated beyond the markup we're given -- and the spec is silent on how much we can trust that to contain Parsoid attributes.
My goal for this information is to support template-defined/used references, which VE currently cannot see. I've done some speculative work on extracting this information by assuming that the template's internal markup can be trusted. This seems to somewhat work, but is resting on some potentially fragile assumptions.
From the VisualEditor perspective, the data contained on nodes within another Parsoid node is a pain to parse, because we assume that we can iterate through the document and entirely discard a node once we've identified it for handling. We're not set up for nodes that might be identified-and-handled and then need a separate conversion pass for their contents.
Thus I have a suggestion to hoist the data Parsoid has up onto the template node, as part of the data-mw attribute.
This is a current simplified infobox template:
{{Infobox|foo=This is a reference in the reflist<ref name="infobox-used"/>, and this is defined right here<ref name="infobox-defined">I am a referenced defined inside the reflist</ref>}}As you can see, in the parameter this is creating a few <ref> usages.
This turns into this markup:
<table class="toccolours tpl-infobox" about="#mwt1" typeof="mw:Transclusion" data-mw='{"parts":[{"template":{"target":{"wt":"Infobox","href":"./Template:Infobox"},"params":{"foo":{"wt":"This is a reference in the reflist<ref name=\"infobox-used\"/>, and this is defined right here<ref name=\"infobox-defined\">I am a referenced defined inside the reflist</ref>"}},"i":0}}]}' id="mwAg"> <caption style="font-size: 125%;"><strong> SandboxReferences </strong></caption> <tbody><tr><th>Foo</th><td>This is a reference in the reflist<sup about="#mwt2" class="mw-ref reference" id="cite_ref-infobox-used_1-0" rel="dc:references" typeof="mw:Extension/ref" data-mw='{"name":"ref","attrs":{"name":"infobox-used"}}'><a href="./SandboxReferences#cite_note-infobox-used-1" id="mwAw"><span class="mw-reflink-text" id="mwBA"><span class="cite-bracket" id="mwBQ">[</span>1<span class="cite-bracket" id="mwBg">]</span></span></a></sup>, and this is defined right here<sup about="#mwt3" class="mw-ref reference" id="cite_ref-infobox-defined_2-0" rel="dc:references" typeof="mw:Extension/ref" data-mw='{"name":"ref","attrs":{"name":"infobox-defined"},"body":{"id":"mw-reference-text-cite_note-infobox-defined-2"}}'><a href="./SandboxReferences#cite_note-infobox-defined-2" id="mwBw"><span class="mw-reflink-text" id="mwCA"><span class="cite-bracket" id="mwCQ">[</span>2<span class="cite-bracket" id="mwCg">]</span></span></a></sup></td></tr> </tbody></table>
...and pulling out the mw-data for easier viewing:
{ "parts": [ { "template": { "target": { "wt": "Infobox", "href": "./Template:Infobox" }, "params": { "foo": { "wt": "This is a reference in the reflist<ref name=\"infobox-used\"/>, and this is defined right here<ref name=\"infobox-defined\">I am a referenced defined inside the reflist</ref>" } }, "i": 0 } } ] }
However, the generated <ref> markup inside the template output contains useful information. It could be pulled up into the parent's mw-data like this:
{ "parts": [ { "template": { "target": { "wt": "Infobox", "href": "./Template:Infobox" }, "params": { "foo": { "wt": "This is a reference in the reflist<ref name=\"infobox-used\"/>, and this is defined right here<ref name=\"infobox-defined\">I am a referenced defined inside the reflist</ref>" } }, "i": 0 } } ], "contains": [ {"name":"ref","attrs":{"name":"infobox-used"},"about":"#mwt2"}, {"name":"ref","attrs":{"name":"infobox-defined"},"body":{"id":"mw-reference-text-cite_note-infobox-defined-2"},"about":"#mwt3"} ] }
That'd be enough information for our current needs, and the about attributes getting added into the data would make it trivial to extract more from the markup if needed without requiring a full conversion pass.
The potential drawback of providing this would be that I used a very simple example above, and in complicated template situations (e.g. the average enwiki Infobox) there might be a lot of duplication from pulling all the data-mw up like this. This could either just be accepted as the cost of an improvement, or could potentially be mitigated by exploiting the way the spec currently has no guarantees about the contents of a template and stripping the internal mw-datas then specifying that people actually parsing contents may need to reconstruct the mw-data from the wrapper.
There's also potential questions about how nested templates should be represented -- should everything be pulled up into the top-level element's contains, or would it be expected to potentially recurse? (I'd hope for the former, but the latter might be simpler to implement.)