Page MenuHomePhabricator

Copy-pasting a cell with a wiki link to the cell before it doesn't add a newline (breaks the table)
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

Before:

{| class="wikitable"
|-
|[[Eau]]
|[[Terre]]
|}

What happens?:

{| class="wikitable"
|-
|[[Eau]]|[[Eau]]
|[[Terre]]
|}

What should have happened instead?:

{| class="wikitable"
|-
|[[Eau]]
|[[Eau]]
|[[Terre]]
|}

or (less idiomatic, but still technically correct):

{| class="wikitable"
|-
|[[Eau]]||[[Eau]]
|[[Terre]]
|}

Other information (browser name/version, screenshots, etc.):

Reproduced with Firefox 105 and Chrome 106.

Event Timeline

@JMcLeod_WMF, can you please check with the Editing team if they can take a look at this?

This is the markup we submit to parsoid:

<table class="wikitable" id="mwAg">
<tbody id="mwAw"><tr id="mwBA"><td><a href="./Eau" rel="mw:WikiLink" title="Eau" id="mwBg">Eau</a></td><td id="mwBQ"><a href="./Eau" rel="mw:WikiLink" title="Eau" id="mwBg">Eau</a></td>
<td id="mwBw"><a href="./Terre" rel="mw:WikiLink" title="Terre" id="mwCA">Terre</a></td></tr>
</tbody></table>

This doesn't look obviously incorrect to me -- is Parsoid just preserving the newlines inside the table during the wikitext conversion maybe?

I suspect a selser issue, where Parsoid is reusing the "|[[Eau]]" which was the wikitext corresponding to the DOM subtree when that subtree is pasted, because the IDs for the first and second <a> tag are identical, although that *shouldn't* be happening because the first and second td tags have different IDs. Needs investigation.

ssastry triaged this task as Medium priority.Nov 3 2022, 2:25 PM
ssastry moved this task from Needs Triage to Bugs & Crashers on the Parsoid board.

Current state of the investigation.

  • The issue is not reproducible on a "regular" local installation
  • It is reproducible on a local installation with RESTBase
  • It is also reproducible on enwiki and mediawiki but not on officewiki
  • there are some RESTBase-related things in flux (see T320529) that tend to eliminate RESTBase *per se* as a source of the issue, but there's still something happening in that case that does not happen on a private install.

With that in mind, I dug to try to get a workable reproduction of the issue which, despite requiring a tiny bit of code modification, is what I believe triggers the issue.

  • Edit DOMDiff.php to add 'id' to IGNORE_ATTRIBUTES (line 30-41)
  • Create the following local files:

simple-before.wt - initial wikitext

{|
|[[Eau]]
|}

simple-before.html - initial HTML, before edit

<body id="mwAA" lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output" dir="ltr" data-parsoid='{"dsr":[0,14,0,0]}'><table id="mwAg" data-parsoid='{"dsr":[0,14,2,2]}'>
<tbody id="mwAw" data-parsoid='{"dsr":[3,12,0,0]}'><tr id="mwBA" data-parsoid='{"autoInsertedStart":true,"dsr":[3,11,0,0]}'><td id="mwBQ" data-parsoid='{"dsr":[3,11,1,0]}'><a rel="mw:WikiLink" href="./index.php?title=Eau" title="Eau" id="mwBg" data-parsoid='{"stx":"simple","a":{"href":"./index.php?title=Eau"},"sa":{"href":"Eau"},"dsr":[4,11,2,2]}'>Eau</a></td></tr>
</tbody></table></body>

simple-after.html - HTML after edit

<body lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output" dir="ltr" data-parsoid='{"dsr":[0,14,0,0]}'><table data-parsoid='{"dsr":[0,14,2,2]}'>
<tbody data-parsoid='{"dsr":[3,12,0,0]}'><tr data-parsoid='{"autoInsertedStart":true,"dsr":[3,11,0,0]}'><td data-parsoid="{}"><a href="./index.php?title=Eau" rel="mw:WikiLink" title="Eau" data-parsoid='{"stx":"simple","a":{"href":"./index.php?title=Eau"},"sa":{"href":"Eau"},"dsr":[4,11,2,2]}'>Eau</a></td><td data-parsoid='{"dsr":[3,11,1,0]}'><a href="./index.php?title=Eau" rel="mw:WikiLink" title="Eau" data-parsoid='{"stx":"simple","a":{"href":"./index.php?title=Eau"},"sa":{"href":"Eau"},"dsr":[4,11,2,2]}'>Eau</a></td></tr>
</tbody></table></body>

Then the issue can be reproduced with

$ php ./bin/parse.php --html2wt --selser --inputfile ../samples/T319143/simple-after.html --oldtextfile ../samples/T319143/simple-before.wt --oldhtmlfile ../samples/T319143/simple-before.html 
{|
|[[Eau]]|[[Eau]]
|}

The issue does not show up if parse.php is not provided with the old html file.
As a side note: the issue still does reproduce with the "|-" as start of line in the wikitext (which would remove the autoInsertedStart attribute in the tr - that's irrelevant.)

Change 866623 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/services/parsoid@master] WIP - don't reuse separator between newly inserted cell and cell

https://gerrit.wikimedia.org/r/866623

Change 866623 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Don't reuse separator between newly inserted cell and cell

https://gerrit.wikimedia.org/r/866623

Change 868452 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/services/parsoid@master] Adding tests for copy-paste of table cells in rows

https://gerrit.wikimedia.org/r/868452

Change 877256 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@master] Bump parsoid to 0.17.0-a10

https://gerrit.wikimedia.org/r/877256

Change 877256 merged by jenkins-bot:

[mediawiki/vendor@master] Bump parsoid to 0.17.0-a10

https://gerrit.wikimedia.org/r/877256

Change 868452 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Adding tests for copy-paste of table cells in rows

https://gerrit.wikimedia.org/r/868452

Just checked on mediawiki that this was solved. Should deploy to frwiki with the train later today.

Change 880501 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@master] Bump parsoid to 0.17.0-a11

https://gerrit.wikimedia.org/r/880501

Change 880501 merged by jenkins-bot:

[mediawiki/vendor@master] Bump parsoid to 0.17.0-a11

https://gerrit.wikimedia.org/r/880501