Page MenuHomePhabricator

Template(s) in prominent article are parsed incorrectly
Closed, DuplicatePublic

Description

In the article [[España]] on eswiki, the lead paragraph appears mangled because of a parsing issue with one or more of the notes/references templates.

Here is the article in mobile web: https://es.m.wikipedia.org/wiki/Espa%C3%B1a

And the article via mobile-html: https://es.wikipedia.org/api/rest_v1/page/mobile-html/Espa%C3%B1a

Parsoid: https://es.wikipedia.org/api/rest_v1/page/html/Espa%C3%B1a

(The incorrect content appears after the [nota 1] reference in the first sentence.)

Event Timeline

Dbrant triaged this task as High priority.May 25 2020, 11:57 PM

Reproducible with this snippet:

[subbu@earth:~/work/wmf/parsoid] echo "{{Refn|FOO <ref name='X 1'>bar</ref>}}" | php bin/parse.php --normalize

<p><sup id="cite_ref-2"><a href="Main_Page#cite_note-2" style="counter-reset: mw-Ref 2;"><span>[2]</span></a></sup></p>
<div>
<ol>
<li id="cite_note-X_1-1"><a href="Main_Page#cite_ref-X_1_1-0"><span>↑ </span></a> <span>bar</span></li>
<li id="cite_note-2"><a href="Main_Page#cite_ref-2"><span>↑ </span></a> <span>FOO <sup id="cite_ref-X_1_1-0"><a href="Main_Page#cite_note-X_1-1" style="counter-reset: mw-Ref 1;"><span>[1]</span></a></sup></span></li>
</ol>
</div>

[subbu@earth:~/work/wmf/parsoid] echo "{{Refn|FOO <ref name='X/1'>bar</ref>}}" | php bin/parse.php --normalize

<p><sup id="cite_ref-1"><a href="Main_Page#cite_note-1" style="counter-reset: mw-Ref 1;"><span>[1]</span></a></sup><span>&lt;/ref></span></p>
<div>
<ol>
<li id="cite_note-1"><a href="Main_Page#cite_ref-1"><span>↑ </span></a> <span>FOO &lt;ref name='X/1'>bar</span></li>
</ol>
</div>

The / char in the ref name when embedded in a the Refn template seems to throw off Parsoid's tokenization. It is probably an edge case in our PEG grammar since the following is handled just fine:

[subbu@earth:~/work/wmf/parsoid] echo "{{1x|FOO <ref name='X/1'>bar</ref>}}" | php bin/parse.php --normalize

<p>FOO <sup id="cite_ref-X/1_1-0"><a href="Main_Page#cite_note-X/1-1" style="counter-reset: mw-Ref 1;"><span>[1]</span></a></sup></p>
<div>
<ol>
<li id="cite_note-X/1-1"><a href="Main_Page#cite_ref-X/1_1-0"><span>↑ </span></a> <span id="mw-reference-text-cite_note-X/1-1">bar</span></li>
</ol>
</div>