Parsoid changes </tr> to </tr in HTML tables with </tr> but no <tr>
Closed, ResolvedPublic

Description

When the wikitext contains an HTML table that includes </tr> tags but no corresponding <tr> tags, the table renders fine in viewing mode and in VE.
e.g.

H1H2H3

Renders as intended.

However, when any edit to the table is made, Parsoid (presumably) strips the < from all the </tr> tags.
Sandbox: https://en.wikipedia.org/w/index.php?title=User:Thryduulf/sandbox&diff=570513680&oldid=570513611
Reported dif: https://en.wikipedia.org/w/index.php?title=2005%E2%80%9306_Polish_Basketball_League&diff=570501618&oldid=569248952

This isn't something that Parsoid can expect to encounter frequently, but it shouldn't make a mess of it when it does.


Version: unspecified
Severity: minor

bzimport added a project: Parsoid.Via ConduitNov 22 2014, 1:57 AM
bzimport set Reference to bz53464.
Thryduulf created this task.Via LegacyAug 28 2013, 10:43 AM
ssastry added a comment.Via ConduitAug 28 2013, 3:50 PM

Unrelated thing we could also fix. the "|-" could instead be serialized as <tr>..</tr> based on HTML syntax rest of the table.

[subbu@earth tests] node parse --wt2wt --fetchConfig false < /tmp/tbl

Bad table syntax (expected rows <tr>...</tr>) near: <tr> <th>H1</th> <th>H2</th> <th>H3</th> </tr> |-<td>[[Fish]]</td> <td>[[Cake]]</td> <td>[[Scone]]</td> |-<td>[[Goldfish]]</td> <td>[[Golden syrup]]</td> <td>[[Cream]]</td>
GWicke added a comment.Via ConduitAug 28 2013, 3:51 PM

This sounds a lot like a slightly wrong DSR.

ssastry added a comment.Via ConduitAug 28 2013, 10:45 PM

There are two different pieces of this:

  1. With the missing HTML <tr> tag, Parsoid incorrectly assumes that the missing <tr> tag inserted by the tree builder comes from wikitext (which can be seen by the |- output in the non-selser serialization above). This confusion is the reason for the off-by-1 DSR error. I can fix that in the parser (and have a fix for it) with an additional check to determine whether a <tr> comes from wikitext.
  1. However, this also needs a fix in the serializer. Where the content of the <tr> is unchanged, selser accurated spits back the original wikitext. However, the serializer also needs a fix so that it attempts HTML-tag serialization instead of wikitext-tag serialization on the <tr>. That will fix the second issue.

As for 1. the problem is in migrateTrailingNLs. I fixed it, but I think we added this dom pass to deal with some issues in the WTS during refactoring to eliminate spurious diffs around trailing newlines and separators. I am tempted to simply rip out that pass and see what happens (which as a side effect, also fixes part 1. of this bug). For now, maybe I will push a fix for both, but I think we should consider getting rid of that pass altogether, if possible since it may not be necessary anymore.

ssastry added a comment.Via ConduitAug 28 2013, 11:05 PM

You know, I can fix this, but it will be kind of hacky ... I am going to push my changes into a local branch and revisit this after I look at removing the migrateTrailingNLs handler. Deleting code is better than adding code. :-) This is not a critical bug anyway.

GWicke added a comment.Via ConduitAug 28 2013, 11:30 PM

I believe we added migrateTrailingNLs at some point in the past to accomodate missing trailing WS handling in VE. We tried to keep all trailing whitespace outside of paragraphs etc. Afaik VE can handle that now, so there should not be a need for the migration any more.

gerritbot added a comment.Via ConduitAug 28 2013, 11:41 PM

Change 81600 had a related patch set uploaded by Subramanya Sastry:
(Bug 53464): Improved detection of missing opening HTML tags

https://gerrit.wikimedia.org/r/81600

ssastry added a comment.Via ConduitAug 28 2013, 11:42 PM

Never mind. I found a simple and easy non-hacky fix which might apply in other situations as well and fixes all issues reported in this bug. I'll look at getting rid of migrateTrailingNLs separately tomorrow/next week.

gerritbot added a comment.Via ConduitAug 29 2013, 12:35 AM

Change 81600 merged by jenkins-bot:
(Bug 53464): Improved detection of missing opening HTML tags

https://gerrit.wikimedia.org/r/81600

Aklapper added a comment.Via ConduitFeb 12 2014, 3:55 PM

Subbu / Gabriel: Patch was merged month ago - is there more work left here, or can you close this ticket as RESOLVED FIXED?

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.