Page MenuHomePhabricator

Table borders are invisible on some tables with Parsoid
Closed, ResolvedPublic

Description

For example, see Parsoid rendering and compare with legacy rendering. The Awards and Nominations table is missing borders in Parsoid's rendering. But, as it turns out, the border is still there with 1px width (if you inspect the cells in a browser - Firefox in my case). Secondly, if you reload the page, you see that the borders show up briefly and then they disappear (but the table doesn't resize which indicates that the borders just disappeared). I don't know what is going on and haven't investigated -- is this CSS or JS?

Found this when I saw this diff on a couple dagwiki pages and realized that these pages are translations of enwiki pages. So, same issue on the equivalent dagwiki page.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The table has an extra empty <tbody> placed before the <caption>. Deleting that node in console causes the borders to reappear. You can also delete the tbody with the content in it to get the borders inside the thead to reappear. I think the thead is inserted by the table sorter Javascript (don't think that's particularly relevant to the bug but don't want to cause confusion with 'where does the parser insert a thead? - it steals the rows of column header ths from the tbody the parser does insert).

So for some reason something thinks there should be only one tbody, which is not a requirement of HTML. I'd guess the table sorter JS +- a browser bug. I tested in Firefox up to date [131?].

As a second check, I inserted a second tbody into the legacy parser input in what would be a spec-allowed location (before the tbody there already but after the caption and thead) and the borders promptly disappeared.

What I think is an aside: <caption> is required by the spec to be the first child of <table> when present, but this table has both the inserted <thead> and the extra <tbody> come before it. Moving those elements around doesn't fix the issue either, just the aforementioned deletions.

And indeed, the borders are still there as you say, it's just they go to the table background color for some reason in the above.

And this change fixed the issue (you can verify that the lower table is still broken, it has the same setup as the old version).

I guess the legacy parser is less sensitive to that <tr> wikitext. Parsoid is presumably spitting out an extra tbody for that <tr> while the old parser does not. Which causes the table sorter Javascript to be sad, I'd guess (without verifying any assumptions about it).

Either this should be handled in Parsoid or Parsoid should emit a lint for it. I suspect this particular wikitext error is fairly common.

Thanks for that useful analysis @Izno! Actually, not sure you need the "|-" before the caption -- so it can simply be deleted. See below Parsoid output (after stripping data-parsoid attributes from the output).

-----------------------------------------
{|
|-
|+ caption1
!h1
|-
|foo1
|}

{|
|+ caption2
!h2
|-
|foo2
|}
-----------------------------------------
<table>
<tbody><tr class="mw-empty-elt"></tr>
</tbody><caption>caption1</caption>
<tbody><tr><th>h1</th></tr>
<tr>
<td>foo1</td></tr>
</tbody></table>

<table>
<caption>caption2</caption>
<tbody><tr><th>h2</th></tr>
<tr>
<td>foo2</td></tr>
</tbody></table>
-----------------------------------------

I think the empty tablle row before the caption is the issue and can be completely stripped. If Parsoid removes empty tbody and the corresponding empty tr, it would cause dirty diffs on the first VE edit to the table. Not sure how editors might respond to that. If they won't respond favorably, we should emit lints for this.

Actually, I take that back. In the html2wt direction, Parsoid doesn't emit the empty "|-" in the first case as it is. So, deleting it from the DOM before serializing to HTML should be fine.

$  php bin/parse.php --html2wt < ~/x.html 
{|
|+caption1
!h1
|-
|foo1
|}

{|
|+caption2
!h2
|-
|foo2
|}

Actually, not sure you need the "|-" before the caption -- so it can simply be deleted.

And in fact you shouldn't have it, as I stated above - leading at minimum either to Parsoid-should-fix-this for wt->html (I'm a bit agnostic going the other way) or Parsoid-should-lint-this. Again, this feels like something that may be common out and about, so one of those may be preferable.

I moved the |- wikitext in this case because I like to avoid inferring the tr that should appear after the caption to hold the following ths. (The pattern {|\n|\n|\n|} is common, but it's only valid wikitext, not valid HTML, without the insertion of a tr.)

IDK if this would be an acceptable dirty diff. I tend to think it might be, but......... Does Parsoid dirty diff anything deliberately these days? I know we, specifically you and I, talked about at least one other case where I thought a dirty diff might be fine....

Actually, I take that back. In the html2wt direction, Parsoid doesn't emit the empty "|-" in the first case as it is. So, deleting it from the DOM before serializing to HTML should be fine.

Oops! I tripped myself up here by removing all the html2wt-necessary data-parsoid attributes. If those are left behind, Parsoid does wt2wt the snippets without diffs. So, my original analysis was correct.

I moved the |- wikitext in this case because I like to avoid inferring the tr that should appear after the caption to hold the following ths. (The pattern {|\n|\n|\n|} is common, but it's only valid wikitext, not valid HTML, without the insertion of a tr.)

Got it.

IDK if this would be an acceptable dirty diff. I tend to think it might be, but......... Does Parsoid dirty diff anything deliberately these days? I know we, specifically you and I, talked about at least one other case where I thought a dirty diff might be fine....

There are cases where we do for some invalid markup. For example, I see this in the tables.txt parser tests.

## We don't support roundtripping of these attributes in Parsoid.
## Selective serialization takes care of preventing dirty diffs.
## But, on edits, we dirty-diff the invalid attribute text.
!! test
Invalid text in table attributes should be discarded
!! wikitext
{| <span>boo</span> style='border:1px solid black'
|  <span>boo</span> style='color:blue'  |1
|<span>boo</span> style='color:blue'|2
|}
!! html/parsoid
<table style="border:1px solid black">
<tr>
<td style="color:blue">1</td>
<td style="color:blue">2</td>
</tr>
</table>
!! end

So, there is precedent for that. We'll ponder this decision (lint OR dirty-diff-on-table-edit) and handle this appropriately.

I think I am leaning towards deleting the empty tr before the caption and have a dirty diff effectively fix up the table wikitext.

Change #1185235 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/services/parsoid@master] WIP: Strip tbody with empty trs

https://gerrit.wikimedia.org/r/1185235

I think I am leaning towards deleting the empty tr before the caption and have a dirty diff effectively fix up the table wikitext.

If I add the mw-empty-elt class to the tbody itself, it seems to resolve the issue at
https://dag.wikipedia.org/wiki/Dolly_Parton_ni_deei_pini_sh%C9%9B%C5%8Ba_mini_piibu?useparsoid=1#Pina_mini_piibu

That would avoid the dirty diff

@Izno Does that seem like an acceptable solution?

Change #1185235 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Mark "empty" <tbody> elements with mw-empty-elt class

https://gerrit.wikimedia.org/r/1185235

Change #1188397 had a related patch set uploaded (by OSleger; author: OSleger):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.22.0-a22

https://gerrit.wikimedia.org/r/1188397

Change #1188397 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.22.0-a22

https://gerrit.wikimedia.org/r/1188397