Page MenuHomePhabricator

Parsoid allows nested ref tags
Closed, ResolvedPublic

Description

As documented in T3310, MW does not allow extension tags with the same tag name to be nested. An extension tag is treated as containing unstructured plain text up to the next terminating tag. This was a deliberate design choice due to the fact that many extensions (math, timeline, score, etc.) do not use XML-style markup and have no concept of nesting. The bug has been open since 2005 with mixed feelings expressed by several participants on whether it should be fixed.

However, Parsoid allows nesting of extension tags: <ref><ref></ref></ref> etc. So, that is a compatibility break.

Nesting should be allowed either for both parsers or for neither. I'm not aware of any technical reason as to why they should be different, it's just that the Parsoid implementors had a different opinion on T3310 than the core maintainers.

Event Timeline

tstarling updated the task description. (Show Details)
tstarling raised the priority of this task from to Needs Triage.
tstarling added a project: Parsoid.
tstarling added a subscriber: tstarling.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 3 2015, 12:04 AM

However, Parsoid allows nesting of extension tags: <ref><ref></ref></ref> etc. So, that is a compatibility break.

I don't think so.

[subbu@earth tests] echo "<ref>foo<ref>bar</ref>baz</ref>" | node parse --normalize=parsoid

<p><span class="mw-ref" id="cite_ref-1" rel="dc:references" typeof="mw:Extension/ref" data-mw='{"name":"ref","body":{"id":"mw-reference-text-cite_note-1"},"attrs":{}}'><a href="#cite_note-1"><span class="mw-reflink-text">[1]</span></a></span></p>
<ol class="mw-references" typeof="mw:Extension/references" data-mw='{"name":"references","attrs":{}}'>
<li id="cite_note-1"><a href="#cite_ref-1" rel="mw:referencedBy"><span class="mw-linkback-text">↑ </span></a> <span id="mw-reference-text-cite_note-1" class="mw-reference-text">foo&lt;ref>bar&lt;/ref>baz</span></li>
</ol>

There is actually an explicit test that verifies this expectation that Parsoid passes.
"Ref: 14. A nested ref-tag should be emitted as plain text"

Also, for the math tag:

[subbu@earth tests] echo "<math>1+2<math>3+4</math>5+6</math>" | node parse --normalize=parsoid

<p><img class="mwe-math-fallback-image-inline tex" alt="1+2&lt;math>3+4" src="//upload.wikimedia.org/math/0/e/6/0e6150ee7d8744ec95592b1c93e1e023.png" typeof="mw:Extension/math" data-mw='{"name":"math","attrs":{},"body":{"extsrc":"1+2&lt;math>3+4"}}'/>5+6&lt;/math></p>

However, Parsoid allows nesting of extension tags: <ref><ref></ref></ref> etc. So, that is a compatibility break.

I don't think so.

[subbu@earth tests] echo "<ref>foo<ref>bar</ref>baz</ref>" | node parse --normalize=parsoid

<p><span class="mw-ref" id="cite_ref-1" rel="dc:references" typeof="mw:Extension/ref" data-mw='{"name":"ref","body":{"id":"mw-reference-text-cite_note-1"},"attrs":{}}'><a href="#cite_note-1"><span class="mw-reflink-text">[1]</span></a></span></p>
<ol class="mw-references" typeof="mw:Extension/references" data-mw='{"name":"references","attrs":{}}'>
<li id="cite_note-1"><a href="#cite_ref-1" rel="mw:referencedBy"><span class="mw-linkback-text">↑ </span></a> <span id="mw-reference-text-cite_note-1" class="mw-reference-text">foo&lt;ref>bar&lt;/ref>baz</span></li>
</ol>

There is actually an explicit test that verifies this expectation that Parsoid passes.

"Ref: 14. A nested ref-tag should be emitted as plain text"

This test is wrong, that is my point. The "baz" should not be included in the text passed to the extension handler. It should end up in the first paragraph, not in the references section. The MW output for this input is:

<p><strong class="error mw-ext-cite-error">Cite error: Closing <code>&lt;/ref&gt;</code> missing for <code>&lt;ref&gt;</code> tag</strong>baz&lt;/ref&gt;
</p>

i.e. the error message cite_error_included_ref, which is due to the extension handler being called with the string "foo<ref>bar".

Also, for the math tag:

[subbu@earth tests] echo "<math>1+2<math>3+4</math>5+6</math>" | node parse --normalize=parsoid

<p><img class="mwe-math-fallback-image-inline tex" alt="1+2&lt;math>3+4" src="//upload.wikimedia.org/math/0/e/6/0e6150ee7d8744ec95592b1c93e1e023.png" typeof="mw:Extension/math" data-mw='{"name":"math","attrs":{},"body":{"extsrc":"1+2&lt;math>3+4"}}'/>5+6&lt;/math></p>

Yeah, that output is correct. It looks like it is broken for <ref> because of a special case in pegTokenizer.pegjs.txt:

if ( tagName === 'ref' ) {
    // Support 1-level nesting of <ref> tags during tokenizing.
    // <ref> tags are the exception to the rule (no nesting of ext tags)

Updating summary.

tstarling renamed this task from Parsoid allows nested extension tags to Parsoid allows nested ref tags.Jul 3 2015, 3:27 AM
tstarling set Security to None.

Ah .. I see. I vaguely remember that we had to add the exception to allow {{#tag:ref}} nesting in <ref> and this was the unintended side effect of it.

Arlolra triaged this task as High priority.Jul 7 2015, 2:39 AM
Arlolra added a subscriber: Arlolra.
Arlolra moved this task from Backlog to In Progress on the Parsoid board.Jul 7 2015, 3:16 AM
ssastry lowered the priority of this task from High to Low.Jul 7 2015, 3:36 AM

We should fix this, but this is unlikely to be a common use case that matters in production wikis .. hence lowered priority to low.

Yeah, unlikely to be a common case, but the fix would probably touch the same few lines of code as T104523, which is more significant.

Yup, if this gets fixed as part of T104523, it would be a good bonus.

ssastry moved this task from In Progress to Backlog on the Parsoid board.Aug 10 2015, 4:50 PM
Arlolra claimed this task.Apr 8 2016, 6:37 PM

Change 282394 had a related patch set uploaded (by Arlolra):
WIP: Remove <ref> hack from the tokenizer

https://gerrit.wikimedia.org/r/282394

Arlolra raised the priority of this task from Low to Normal.Jun 18 2016, 6:01 PM

Change 282394 had a related patch set uploaded (by Arlolra):
T104662: Allow nested ref tags only in templates

https://gerrit.wikimedia.org/r/282394

Change 282394 merged by jenkins-bot:
T104662: Allow nested ref tags only in templates

https://gerrit.wikimedia.org/r/282394

Arlolra closed this task as Resolved.Dec 12 2016, 6:26 PM