Page MenuHomePhabricator

Parsoid should treat '|-' syntax as row separator, not row start
Closed, DeclinedPublic


The following wiktext

| x || y
| a || b

Results in three rows. Two rows with 2 content cells. And an unexpected third row with empty single cell. The old PHP Parser + Tidy handles this correctly. But when using Parsoid/VisualEditor this creates a broken third row with single cell.


Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

But they aren't separators because you attach attributes to them, e.g.

| x || y
|- style="background:red;"

needs to store that attribute somewhere in the DOM...

ssastry added a subscriber: ssastry.

This is Tidy stripping empty cells in the core parser whereas Parsoid doesn't.

In a top-level page, if Parsoid strips empty cells, it means we need to account for it during roundtripping wikitext. Selective serialization will handle this case in most scenarios, so should not be a big deal, but there is no real reason for Parsoid to go down this route. Clearly this is not a common use on top-level pages (vs. templates where we do strip empty elements for now), otherwise we would have seen this report many times over by now.

The plan is to eventually get rid of these Tidy hacks once T89331 gets deployed and we start making incremental changes to legacy Tidy behavior. So, why add this hack to Parsoid when a simple fix would be to edit the page?

Please reopen if there is a reason to consider this.

So, why add this hack to Parsoid when a simple fix would be to edit the page?

That's reasonable, except that it's not obvious to users. If this empty cell would show up on page views, it'd be removed pretty quickly. However as it is it looks like a bug in VisualEditor (or Parsoid). With that perspective, I'm not sure people would report it as often as you might think. But it looks like a bug either way since it works "fine" normally. Up until today I was fairly sure MediaWiki considered them separators. I imagine there might even be wikis out there that edit war over the "coding style" of tables and whether or not a trailing |- is desirable or not.

I can imagine people have seen this in VE and just shrugged their shoulders assuming it's a minor Parsoid bug that will be fixed eventually.

While technically it is not a separator and MediaWiki does output it internally, it never makes it to the output, so de facto it is a separator. And Parsoid is making a breaking change to that behaviour.

I'm totally in favour of standardising this. And while it's pretty random that MediaWiki doesn't strip it without Tidy (it could've just as easily been implemented as a separator all those years ago, don't attribute too much value to that detail, if Tidy didn't strip it we would've stripped it in MediaWiki years ago). I'm fine with picking the non-stripping behaviour as the standard. However it seems worth investigating a revision dump to see how common it is in article.

And to announce in Tech News so that any wikis or users currently in the habit of using this pattern by default are aware of this "change".

Why was this issue declined? It may be technically incorrect but it has been documented for a long time that |- can be either included or omitted at the beginning and end of tables. It doesn't show up in the rendered output (it doesn't even generate an empty <tr>), but it's visible in VE, and in a not very intuitive way.

(Since I learned that it doesn't make sense to add a trailing |- [which was a long time ago], I just remove those whenever I see them, but still.)

Sorry, I lost track of @Krinkle's response above.

We are going to be replacing Tidy sometime this year (if everything goes to plan) and then start removing some of the Tidy weirdnesses (like empty elt stripping) which we are retaining as a compatibility pass temporarily. That will be hte time to start announcing breaking changes like these.

If this is really a big problem waiting till that time, we can consider instituting a temporary workaround in Parsoid, but only if it is shown to be a problem on a non-trivial number of pages.

@nyuszika7h hope that clarifies this for you as well.