Table parsing diffs: Parsoid adds implicit <td>s after a |- if explicit pipe is not present
Closed, ResolvedPublic


This has been known for a long time, but this is beginning to be the cause of some parsing and rendering diffs.

See this test in parserTests.txt with copious notes.

## Note that Parsoid output differs from PHP and PHP+tidy here.
## The lack of <tr> tags in the PHP output is arguably a bug in the
## PHP parser, which tidy then compounds by fostering the content
## entirely out of the table.  Parsoid recognizes the table context
## and generates <tr> and <td> wrappers as needed.  Hopefully nobody
## depends on PHP's treatment of broken table markup!
!! test
Implicit <td> after a |-
!! options
!! wikitext
!! html/php


!! html/php+tidy
!! html/parsoid
!! end

As it turns out, this does matter in production.

  2. compare vs The reduced test case is vs

In the general case, Parsoid's PEG tokenizer doesn't know whether the line that follows a "|-" is going to be a <td> markup or not. The tokenizer assumes it is going to and adds a <td> if none is present. However, in some odd scenarios like the above, existing pages seem to be relying on the fact that the absence of the <td> leads to the <table> being closed (and then dropped by Tidy since the table is empty -- Tidy likes to drop empty elements).

So, once we replace Tidy (T89331) with a HTML5 parser / serializer, all these pages will see empty-table-ghosts coming back from the dead. So, Parsoid fixes for this will be dependent on how we decide to handle this Tidy fixup scenario.

Related Objects

ssastry created this task.Aug 21 2015, 8:40 PM
ssastry added a subscriber: ssastry.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 21 2015, 8:40 PM

The bug reported in T132668: Table render bug on is because of this wikitext generated by the {{TNT}} template:

{| class="tpl-infobox ext-infobox ext-status-beta" cellspacing="0"
|+ '''[[Special:MyLanguage/Manual:Extensions|MediaWiki extensions manual]]'''
|- class="tpl-infobox-header ext-infobox-header"
! colspan="2" style="padding-top: 0.5em;" | [[File:Crystal Clear action run.png|link=:Special:MyLanguage/Template:Extension#Content|left|40px]] <span style="font-size: 130%;">Kartographer</span><br />
[[Extension status|Release status:]] beta[[Category:MIT licensed extensions]][[Category:beta status extensions]]
[[Category:MediaWiki extensions without a screenshot]]
| style="vertical-align: top" | [[Special:MyLanguage/Template:Extension#type|'''Implementation''']]
| [[Manual:Tag extensions|Tag]][[Category:Tag extensions]], [[Manual:ContentHandler|ContentHandler]][[Category:ContentHandler extensions]]

The category is allocated its only <td>. If we fix this implicit-<td> issue, the category link will get fostered out which is fine we handle that scenario,

Change 335492 had a related patch set uploaded (by Arlolra):
T109897: Remove implicit_table_data_tag rule

Arlolra claimed this task.Feb 2 2017, 10:21 PM

Change 335492 merged by jenkins-bot:
T109897: Remove implicit_table_data_tag rule

Arlolra closed this task as "Resolved".Feb 3 2017, 5:34 AM

Mentioned in SAL (#wikimedia-operations) [2017-02-07T18:24:41Z] <arlolra> Updated Parsoid to f0732260 (T109897)