Page MenuHomePhabricator

Parsoid Cite ref tag parser emits spurious duplicate error when external links are present
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Create a document with the following wikitext:
<ref name="a">[http://foo.invalid/]</ref>
<ref name="a">[http://foo.invalid/]</ref>

For example, https://en.wikipedia.beta.wmflabs.org/w/index.php?title=User:Adamw/sandbox/Cite-Parsoid-dups&action=edit

What happens?:
Parsoid generates errors for the ref tags with external links:

<sup about="#mwt4" class="mw-ref reference" id="cite_ref-a_1-1" rel="dc:references" typeof="mw:Extension/ref mw:Error" data-mw='{"name":"ref","attrs":{"name":"a"},"body":{"html":"&lt;a rel=\"mw:ExtLink\" href=\"http://foo.invalid/\" data-parsoid=&apos;{\"dsr\":[56,77,20,1]}&apos;>&lt;/a>"},"errors":[{"key":"cite_error_references_duplicate_key","params":["a"]}]}'><a href="./User:Adamw/sandbox/Cite-Parsoid-dups#cite_note-a-1" style="counter-reset: mw-Ref 1;" id="mwBQ"><span class="mw-reflink-text" id="mwBg">[1]</span></a></sup></p>

Example: https://en.wikipedia.beta.wmflabs.org/api/rest_v1/page/html/User:Adamw%2Fsandbox%2FCite-Parsoid-dups

Debugging Parsoid shows that the different is caused by the "dsr" data. Perhaps this should be stripped before comparing the HTML of each ref tag?

What should have happened instead?:
Happy parse with no errors, just like the second half of the example where ref tags include no external link.

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Event Timeline

Arlolra subscribed.

Debugging Parsoid shows that the different is caused by the "dsr" data. Perhaps this should be stripped before comparing the HTML of each ref tag?

Yeah, that seems fine. Code comments suggest some other normalization that should be done before the comparison as well,
https://github.com/wikimedia/mediawiki-services-parsoid/blob/master/src/Ext/Cite/References.php#L216-L218

ABreault-WMF claimed this task.
ABreault-WMF subscribed.

No longer the case

$ echo "<ref name="a">[http://foo.invalid/]</ref>\n<ref name="a">[http://foo.invalid/]</ref>" | php bin/parse.php --integrated
<p data-parsoid='{"dsr":[0,79,0,0]}'><sup about="#mwt1" class="mw-ref reference" id="cite_ref-a_1-0" rel="dc:references" typeof="mw:Extension/ref" data-parsoid='{"dsr":[0,39,12,6]}' data-mw='{"name":"ref","attrs":{"name":"a"},"body":{"id":"mw-reference-text-cite_note-a-1"}}'><a href="./Main_Page#cite_note-a-1" data-parsoid="{}"><span class="mw-reflink-text" data-parsoid="{}"><span class="cite-bracket" data-parsoid="{}">[</span>1<span class="cite-bracket" data-parsoid="{}">]</span></span></a></sup>
<sup about="#mwt2" class="mw-ref reference" id="cite_ref-a_1-1" rel="dc:references" typeof="mw:Extension/ref" data-parsoid='{"dsr":[40,79,12,6]}' data-mw='{"name":"ref","attrs":{"name":"a"},"body":{"id":"mw-reference-text-cite_note-a-1"}}'><a href="./Main_Page#cite_note-a-1" data-parsoid="{}"><span class="mw-reflink-text" data-parsoid="{}"><span class="cite-bracket" data-parsoid="{}">[</span>1<span class="cite-bracket" data-parsoid="{}">]</span></span></a></sup></p>

<div class="mw-references-wrap" typeof="mw:Extension/references" about="#mwt3" data-parsoid='{"dsr":[80,80,0,0]}' data-mw='{"name":"references","attrs":{},"autoGenerated":true}'><ol class="mw-references references" data-parsoid="{}"><li about="#cite_note-a-1" id="cite_note-a-1" data-mw-footnote-number="1" data-parsoid="{}"><span rel="mw:referencedBy" class="mw-cite-backlink" data-parsoid="{}"><a href="./Main_Page#cite_ref-a_1-0" data-parsoid="{}"><span class="mw-linkback-text" data-parsoid="{}">1 </span></a><a href="./Main_Page#cite_ref-a_1-1" data-parsoid="{}"><span class="mw-linkback-text" data-parsoid="{}">2 </span></a></span> <span id="mw-reference-text-cite_note-a-1" class="mw-reference-text reference-text" data-parsoid="{}"><a rel="mw:ExtLink nofollow" href="http://foo.invalid/" class="external autonumber" data-parsoid='{"dsr":[12,33,20,1]}'></a></span></li>
</ol></div>

Fixed by T380152