Page MenuHomePhabricator

Deleting column after header column causes all columns to become header columns
Closed, ResolvedPublic0 Estimated Story Points

Description

If you use the VisualEditor to delete a column after a header column, it consolidates all the subsequent columns onto the same line of WikiText turning them all into header columns.

Before and after diff: https://test.wikipedia.org/w/index.php?title=Table_Test_3&diff=220262&oldid=220261
Before: https://test.wikipedia.org/w/index.php?title=Table_Test_3&oldid=220261
After: https://test.wikipedia.org/w/index.php?title=Table_Test_3&direction=next&oldid=220261

Event Timeline

kaldari raised the priority of this task from to Needs Triage.
kaldari updated the task description. (Show Details)
kaldari added a project: VisualEditor.
kaldari set Security to None.
kaldari added a subscriber: Jdforrester-WMF.
kaldari subscribed.

This is a bug in the Parsoid serializer.

Table syntax supports the following convenient syntax for multiple heading cells in a row:

! heading1 || heading2 || heading3

If you only wanted the first cell to be a heading cell, you have to break the line:

! heading1
| regular1 || regular2

The Parsoid HTML output for the latter is simply:

<table><tbody><th>heading1</th>
<td>regular1</td><td>regular2</td></tbody></table>

Note the newline between </th> and <td>. This newline corresponds to the newline in the wikitext. If you edit the HTML to remove the newline:

<table><tbody><th>heading1</th><td>regular1</td><td>regular2</td></tbody></table>

And then feed that back through Parsoid, you get:

{|
! heading1 || regular2 || regular3
|}

Which is wrong, because when you parse it back to HTML, you get:

<table><tbody><th>heading1</th><th>heading2</th><th>heading3</th></tbody></table>

(If you literally run the HTML samples above through Parsoid, it actually serializes each cell on its own line, and so this bug doesn't happen. But if there are data-parsoid attributes saying that the original wikitext had the cells all on one line, they stay on one line, and this bug happens.)

@Catrope: I cannot reproduce this in plain html2wt mode(I tried a few different things so far) .. I wonder if this is specific to selser mode .. since you can repro. this in VE.

@Catrope: I cannot reproduce this in plain html2wt mode(I tried a few different things so far) .. I wonder if this is specific to selser mode .. since you can repro. this in VE.

It's not selser, it's data-parsoid. See the last paragraph of my previous comment:

(If you literally run the HTML samples above through Parsoid, it actually serializes each cell on its own line, and so this bug doesn't happen. But if there are data-parsoid attributes saying that the original wikitext had the cells all on one line, they stay on one line, and this bug happens.)

I realize that was confusing, sorry.

So this is the unmodified parsoid output:

<table>
<tbody><tr data-parsoid='{"autoInsertedEnd":true,"autoInsertedStart":true}'><th data-parsoid='{"autoInsertedEnd":true}'> heading1</th>
<td data-parsoid='{"autoInsertedEnd":true}'> regular </td><td data-parsoid='{"stx_v":"row","autoInsertedEnd":true}'> heading2 </td><td data-parsoid='{"stx_v":"row","autoInsertedEnd":true}'> heading3</td></tr>
</tbody></table>

Removing the first <td> produces:

$ cat t85782-3b.html<table>
<tbody><tr data-parsoid='{"autoInsertedEnd":true,"autoInsertedStart":true}'><th data-parsoid='{"autoInsertedEnd":true}'> heading1</th>
<td data-parsoid='{"stx_v":"row","autoInsertedEnd":true}'> heading2 </td><td data-parsoid='{"stx_v":"row","autoInsertedEnd":true}'> heading3</td></tr>
</tbody></table>
$ tests/parse.js --html2wt < t85782-3b.html
{|
! heading1
|| heading2 || heading3
|}

Bug!

The problem seems to lie in the stx_v annotation, which should really be ignored if it occurs in SOL position --- or if the preceding cell was modified, but careful: in this case the preceding cell was deleted! But there should be a mw:Diffmarker there, so look for that.

This ought to fix the problem:

$ cat t85782-4b.html<table>
<tbody><tr data-parsoid='{"autoInsertedEnd":true,"autoInsertedStart":true}'><th data-parsoid='{"autoInsertedEnd":true}'> heading1</th>
<td data-parsoid='{"NOT-HERE-ANYMORE-stx_v":"row","autoInsertedEnd":true}'> heading2 </td><td data-parsoid='{"stx_v":"row","autoInsertedEnd":true}'> heading3</td></tr>
</tbody></table>
$ tests/parse.js --html2wt < t85782-4b.html
{|
! heading1
| heading2 || heading3
|}

Can anyone see a problem with this proposed fix?

Change 193424 had a related patch set uploaded (by Subramanya Sastry):
WIP: Deal with chameleon nodes that change output based on surroundings

https://gerrit.wikimedia.org/r/193424

Change 193424 merged by jenkins-bot:
Deal with chameleon nodes that change output based on surroundings

https://gerrit.wikimedia.org/r/193424

ssastry added a subscriber: Ryasmeen.

This has been merged and has been through successful RT testing. @Catrope tested an early version of this patch against VE and confirmed it working. I am going to close this now -- this will be deployed tomorrow. Given that this is now in betalabs, @Ryasmeen can you test this there and reopen if not fixed?