Page MenuHomePhabricator

Table bug, rows getting merged
Open, MediumPublic

Description

Per Kerry at en.wiki,
"I did an edit to add a row into the table in List of Ministers of Public Works (Queensland).As you will see in the (diff), it broke the table syntax (two rows are merged into one very long row). In case it is relevant, the citation in the last column of the new row was copied and pasted from one of the later rows (at this stage, all the rows in the table have the same source). I have made a number of similar edits to this article (adding rows and copying the citation) without problems, so I am not sure what happened on this occasion. The workaround was to add a newline in the appropriate spot using the source editor; the workaround in the VE was far too much work to even contemplate (I presume I would have had to fix it by deleting each of the extended columns individually)."
Another example here.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Can't reproduce in Chrome or FF, but it shouldn't be possible to generate the wikitext seen here, so likely a Parsoid bug.

ssastry triaged this task as Medium priority.Apr 9 2017, 9:59 PM

Looks like Parsoid forgot to add a newline before the |-, in some situation involving a <ref> tag.

ssastry moved this task from Needs Investigation to Bugs & Crashers on the Parsoid board.
ssastry added a subscriber: ssastry.

History of that page shows 4 instances of the same error all involving a ref tag. This clearly looks like a Parsoid (probably selser source reuse) bug.

I can reproduce this on the command-line:

[subbu@earth:~/work/wmf/parsoid] cat /tmp/wt
x <ref name="foo">foo</ref>
{|
|-
|a
|b <ref name="foo" />
|}

I copied the parsed HTML from /tmp/old.html to /tmp/new.html and to mimic the copy-paste of the named reference which preserves the data-parsoid id, I left behind the data-parsoid with the DSR property in it (which is a duplicate of the DSR from where it was copied). In addition, there is no newline in the HTML string between the </tr> and the following <tr>. Turns out both of these are required to reproduce the bug.

[subbu@earth:~/work/wmf/parsoid] cat /tmp/new.html
<p data-parsoid='{"dsr":[0,27,0,0]}'>x <sup about="#mwt3" class="mw-ref" id="cite_ref-foo_1-0" rel="dc:references" typeof="mw:Extension/ref" data-parsoid='{"dsr":[2,27,16,6]}' data-mw='{"name":"ref","attrs":{"name":"foo"},"body":{"id":"mw-reference-text-cite_note-foo-1"}}'><a href="./Main_Page#cite_note-foo-1" style="counter-reset: mw-Ref 1;" data-parsoid="{}"><span class="mw-reflink-text" data-parsoid="{}">[1]</span></a></sup></p>
<table data-parsoid='{"dsr":[28,61,2,2]}'>
<tbody data-parsoid='{"dsr":[31,59,0,0]}'><tr>
<td>c</td>
<td>d <sup about="#mwt6" class="mw-ref" id="cite_ref-foo_1-1" rel="dc:references" typeof="mw:Extension/ref" data-parsoid='{"dsr":[40,58,18,0]}' data-mw='{"name":"ref","attrs":{"name":"foo"}}'><a href="./Main_Page#cite_note-foo-1" style="counter-reset: mw-Ref 1;" data-parsoid="{}"><span class="mw-reflink-text" data-parsoid="{}">[1]</span></a></sup></td></tr><tr data-parsoid='{"startTagSrc":"|-","dsr":[31,58,2,0]}'>
<td data-parsoid='{"dsr":[34,36,1,0]}'>a</td>
<td data-parsoid='{"dsr":[37,58,1,0]}'>b <sup about="#mwt6" class="mw-ref" id="cite_ref-foo_1-1" rel="dc:references" typeof="mw:Extension/ref" data-parsoid='{"dsr":[40,58,18,0]}' data-mw='{"name":"ref","attrs":{"name":"foo"}}'><a href="./Main_Page#cite_note-foo-1" style="counter-reset: mw-Ref 1;" data-parsoid="{}"><span class="mw-reflink-text" data-parsoid="{}">[1]</span></a></sup></td></tr>
</tbody></table>

<div class="mw-references-wrap" typeof="mw:Extension/references" about="#mwt7" data-parsoid='{"dsr":[62,62,0,0]}' data-mw='{"name":"references","attrs":{},"autoGenerated":true}'><ol class="mw-references references" data-parsoid="{}"><li about="#cite_note-foo-1" id="cite_note-foo-1" data-parsoid="{}"><span rel="mw:referencedBy" data-parsoid="{}"><a href="./Main_Page#cite_ref-foo_1-0" data-parsoid="{}"><span class="mw-linkback-text" data-parsoid="{}">1 </span></a><a href="./Main_Page#cite_ref-foo_1-1" data-parsoid="{}"><span class="mw-linkback-text" data-parsoid="{}">2 </span></a></span> <span id="mw-reference-text-cite_note-foo-1" class="mw-reference-text" data-parsoid="{}">foo</span></li></ol></div>

Now look at the non-selser and selser output below.

[subbu@earth:~/work/wmf/parsoid] php bin/parse.php --html2wt < /tmp/new.html
x <ref name="foo">foo</ref>
{|
|c
|d <ref name="foo" />
|-
|a
|b <ref name="foo" />
|}

<references />

[subbu@earth:~/work/wmf/parsoid] php bin/parse.php --selser --oldtextfile /tmp/wt --oldhtmlfile /tmp/old.html < /tmp/new.html 
x <ref name="foo">foo</ref>
{|
|c
|d <ref name="foo" />|-
|a
|b <ref name="foo" />
|}

Leaving this info here so it can help with debugging and fixing the bug.