Page MenuHomePhabricator

Wikitext representing an empty <tr> produces technically-invalid output
Open, MediumPublic

Description

A fairly common pattern for tables in wikitext seems to be:

{| class="wikitable"
|'''Properties'''
|'''Exact Value'''
|'''Measurement'''
|'''I Don’t Know/Am Not Sure'''
|----
|generation time (P3337)
|
|X
|
|----
|operating income (P3362)
|
|X
|
|----
|}

Specifically ending on:

|----
|}

Following this pattern results in a blank row being specified at the end of the table. When rendering wikitext to HTML, Parsoid turns this into an empty <tr></tr>, which is certainly a fair representation of what the wikitext says, but is also invalid HTML.

Currently, because of Tidy, that empty tr gets stripped. As we remove Tidy, we'll start outputting invalid markup as a result of this.

Thoughts on what the correct course of action here is? Obviously, we can just not-output empty trs, but in cases like providing output to Visual Editor that may result in some inadvertent changes to the underlying wikitext on a save. Or at least having to put some thought into what changes should cause a stripped tr to actually be removed, etc.

Event Timeline

This recently came up in the context of T152387 and T152659.

ssastry triaged this task as Medium priority.Apr 9 2017, 9:30 PM

I think we should (a) treat this as broken wikitext requiring it to be fixed -- this can be flagged via Parsoid & Linter (b) strip empty <tr>s from table to prevent invalid output.

In the case of (b), Parsoid's selective serializer will preserve the original wikitext, but on edits, I think it is reasonable to dirty-diff it because we want it fixed as per (a).

In the case of (b), Parsoid's selective serializer will preserve the original wikitext, but on edits, I think it is reasonable to dirty-diff it because we want it fixed as per (a).

Actually selser wont unless Parsoid introduces a marker in place of the stripped row .. or some other placeholder information like that. Alternatively, we can dirty-diff fix the table on any edit to the table. However, if editors want to retain this syntactical feature for whatever reason, Parsoid will have to introduce this always.

Side note, the pattern at |---- is generally ambiguous also since it could be interpreted as <td><hr></td> rather than <tr>. In most cases that isn't the intention, I believe, and I would guess based on the above comments the parsers aren't interpreting it like so.

Actually selser wont unless Parsoid introduces a marker in place of the stripped row .. or some other placeholder information like that. Alternatively, we can dirty-diff fix the table on any edit to the table. However, if editors want to retain this syntactical feature for whatever reason, Parsoid will have to introduce this always.

A few use cases where this is introduced:

  1. A newbie adds it because they don't understand they can end on a table cell rather than a table row. I don't think we need to support this use case, period.
  2. An experienced editor maintains a page where data in a table is updated frequently (for any length of frequently). There are probably a couple interactions here:
    1. The experienced user wants to make it more-obvious to new users that a new row goes beneath the |-. I don't think this is a particularly effective way of doing this anyway.
    2. The experienced user wants to save himself trouble copy-pasting some complex inline CSS from an above row without needing to hunt for that row-in-question above. I'm not particularly sympathetic to this reason since they can always use a talk page or HTML comments to preserve desired row formatting (never mind the Get Rid of Inline CSS discussion elsewhere).

My feeling is that you could probably dirty diff this and get away with it.

Does the below syntax also produce an empty row?

|-
|-

I was about to work on this, but the HTML5 spec contradicts this statement from the description:

When rendering wikitext to HTML, Parsoid turns this into an empty <tr></tr>, which is certainly a fair representation of what the wikitext says, but is also invalid HTML.

See https://www.w3.org/TR/html5/tabular-data.html#the-tr-element - the content model says zero or more td, th, and script-supporting elements. That indicates that am empty <tr> row is not invalid HTML. Any reason not to decline this?

Any reason not to decline this?

I'm of two minds. The biggest argument for it would be consistency with current output -- wikitext + Tidy removes empty <tr>s, and people write pages expecting that. But unless this gets written into wikitext + RemexHTML, Parsoid would become out of sync with that behavior whenever the switch happens.

Not only pages directly, but I do know hundreds and thousands of template programmings in various wikis, e.g. for infoboxes, looking like this:

|-
{{#if: {{{map|}}} |
   ! Map: !! {{{map}}} 
}}
|-
{{#if: {{{climate|}}} |
   ! Climate diagram: !! {{{climate}}} 
}}
|-

Much fun.

From template expansion views personally I shifted to

{{#if: {{{something|}}} | <nowiki />
   {{!}}-
   {{!}} Something: {{!}}{{!}} {{{something}}} 
}}{{#if: {{{another|}}} | <nowiki />
   {{!}}-
   {{!}} Another: {{!}}{{!}} {{{another}}} 
}}

Note the influence of line breaks on various parameter interaction and paragraph issues.

Any reason not to decline this?

I'm of two minds. The biggest argument for it would be consistency with current output -- wikitext + Tidy removes empty <tr>s, and people write pages expecting that. But unless this gets written into wikitext + RemexHTML, Parsoid would become out of sync with that behavior whenever the switch happens.

Remex still seems to be stripping the empty trs generated by the example text in the above?