This has been known for a long time, but this is beginning to be the cause of some parsing and rendering diffs.
See this test in parserTests.txt with copious notes.
## Note that Parsoid output differs from PHP and PHP+tidy here. ## The lack of <tr> tags in the PHP output is arguably a bug in the ## PHP parser, which tidy then compounds by fostering the content ## entirely out of the table. Parsoid recognizes the table context ## and generates <tr> and <td> wrappers as needed. Hopefully nobody ## depends on PHP's treatment of broken table markup! !! test Implicit <td> after a |- !! options parsoid=wt2html,html2html !! wikitext {| |- a |} !! html/php <table> a </table> !! html/php+tidy <p>a</p> !! html/parsoid <table> <tr><td>a</td></tr> </table> !! end
As it turns out, this does matter in production.
- https://www.mediawiki.org/wiki/Parsoid/Round-trip_testing/Diffs#Implicit_.3Ctd.3E_insertion
- compare https://en.wikipedia.org/w/index.php?title=Hartmut_Briesenick&oldid=677208396 vs http://parsoid-lb.eqiad.wikimedia.org/enwiki/Hartmut_Briesenick?oldid=677208396. The reduced test case is https://www.mediawiki.org/wiki/User:SSastry_%28WMF%29/Tests/Tables/1 vs http://parsoid-lb.eqiad.wikimedia.org/mediawikiwiki/User%3ASSastry_%28WMF%29%2FTests%2FTables%2F1?oldid=1856312
In the general case, Parsoid's PEG tokenizer doesn't know whether the line that follows a "|-" is going to be a <td> markup or not. The tokenizer assumes it is going to and adds a <td> if none is present. However, in some odd scenarios like the above, existing pages seem to be relying on the fact that the absence of the <td> leads to the <table> being closed (and then dropped by Tidy since the table is empty -- Tidy likes to drop empty elements).
So, once we replace Tidy (T89331) with a HTML5 parser / serializer, all these pages will see empty-table-ghosts coming back from the dead. So, Parsoid fixes for this will be dependent on how we decide to handle this Tidy fixup scenario.