Page MenuHomePhabricator

thead, tbody, tfoot for wikitable syntax
Open, MediumPublic

Description

Author: michael

Description:
Adding wikitable support for thead, tbody, and tfoot elements would be a harmless enhancement, allowing more sophisticated
formatting of tables (both in pages' wikitext, and using style attributes or style sheets).

Logically, each element would only need a start tag, and would be closed when the following element starts or the table ends (as
|- serves for table rows). Possible wikitable shortcuts:

thead:

|!    associated with table headers (but possibly confusing)
|^    analogous to GREP start of string
|<    analogous to XML/HTML tag opening
|[    opening bracket representing start

tbody:

|=    fatter version of table row
|[    enclosing bracket representing a block

tfoot:

|_    underscore=bottom line
|$    analogous to GREP end of string
|>    analogous to HTML/XML tag closing
|/    analogous to HTML/XML closing tag
|]    closing bracket representing ending
`

See also Bug 3156: T5156: Request not to filter <tbody> and </tbody> codes

Details

Reference
bz4740

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:02 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz4740.
bzimport added a subscriber: Unknown Object (MLST).

ayg wrote:

Why not use heuristics? Not ideal, but a great improvement over the current
situation. Proposal:

  1. Any sequence of rows falling at the end of a table and consisting entirely of

header cells is a <tfoot>.

  1. Any other sequence of rows consisting entirely of header cells is a <thead>.
  2. Any other sequence of rows is a <tbody>.

Thus you would get, e.g.

{|

+ Metasyntactic variables

! Computing

-
Foo
-
Bar
-

! English names

-
Jack
-
Jill
}

<table><caption>Metasyntactic variables</caption> <thead><tr><th>Computing</th></tr></thead> <tbody> <tr><td>Foo</td></tr> <tr><td>Bar</td></tr> </tbody> <thead><tr><th>English Names</th></tr></thead> <tbody> <tr><td>Jack</td></tr> <tr><td>Jill</td></tr> </tbody> </table>

which I believe is correct. Any counterexamples?

ezyang wrote:

The only trouble is if the heuristic turns out to be wrong. Unlikely, but
possible, and if you don't offer any way around it there will be problems.

ayg wrote:

Still better than the current setup, and it doesn't complicate wikimarkup (which
I think is why this isn't enabled).

fgregg wrote:

sorttable.js uses thead and tfoot to know what portions of a table to not sort.
Allowing the use of thead and tfoot would make that table sorting script much
easier to integrated with complicated tables.

paul wrote:

My original suggestion was to pass thead and /thead, if you can't pass them, can
you at least not display them on the output page, in effect ignoring them?

paul wrote:

One way would be to translate them with <!-- before and --> after the <thead>,
<tbody> etc. and </> closures, or have some way to mark them as non-displayed,
so that while they are ignored for functionality, they don't show up on the
rendered page.

bluehairedlawyer wrote:

I'm currently working on an implementation of this bug as per comment Aryeh's comments above. I have encountered considerable problems implementing his suggestion on a line of header cells at the end of a table being a tfoot. The problem is that when the program encounters:

! some header cells

it outputs:

<tr><th> some </th><th> header </th><th> cells

We would then only find out subsequently whether it was actually a footer. We could perform a simple search and replace but that would be greatly complicated by the possibility of embedded tables within the footer cells. As far as I can see implementing full heuristics would require a almost full rewrite. Or something like:

{|

+ Metasyntactic variables

! Computing

-
Foo
-
Bar
-

! English names

-
Jack
-
Jill
=
Footer
}

bluehairedlawyer wrote:

a structural method to implement structural elements: tbody, thead and tfoot

Ignore my previous comments. I've now substantially rewritten the doTableStuff() function, by separating the wiki syntax reading part from the bit that outputs the html. doTableStuff() now collects information about the table into an array which a new function, printTableHtml(), converts into html.

Attached:

bluehairedlawyer wrote:

I forgot to mention the patch includes changes to wikibits.js which didn't appear to support tbody, thead or tfoot elements after all. The changes make sortable tables work in Safari v3 and Firefox v3. It needs to be tested in ie6 and other browsers.

Btw...

{|
! header

}

{|
! header

-
content
-

! footer

}

but...

{|
! header

-
content
-

! header

-
}

I did this on purpose just in case people wanted to had headers at the bottom of their tables. It can be changed!

nicdumz wrote:

keywords : Patch, need-review

andy wrote:

I support the proposal to add these three elements; their availability, with class attributes, will greatly facilitate the use of microformats.

*** Bug 3156 has been marked as a duplicate of this bug. ***

a structural method to implement structural elements: tbody, thead and tfoot v2

Updated patch to apply cleanly to trunk.
Fails heaps of parser tests, fixing that now

attachment new.diff ignored as obsolete

a structural method to implement structural elements: tbody, thead and tfoot v2 v2

last patch contained unrelated changes

attachment new.diff ignored as obsolete

7912: a structural method to implement structural elements: tbody, thead and tfoot v

This ones passes all parsertests (except those which get upset by the new <tbody>). the new html tags are whitelisted now as well.
This patch would enable us to migrate to a better tablesorter script, which would fix a lot of the open table sorting bugs.

Attached:

I think it would be nice to have a new syntax for tfoot and thead rather then (only) hack around current one.

Parse with first row in thead:
{|

+ Title
-

! Head cell !! Head cell

-
Normal cellNormal cell
-
Normal cellNormal cell
}

Parse without thead:
{|

+ Title
-

! Head cell

Normal cell
-
Normal cellNormal cell
-
Normal cellNormal cell
}

Parse rows with "|!-" moved to thead (only if in concurrent rows). Parse rows with "|>-" moved to tfoot (only if in concurrent rows).
{|

+ Title
!-

! Head cell !! Head cell

!-

! Head cell !! Head cell

-
Normal cellNormal cell
-

! Head cell not in thead

Norma cell
-
Normal cellNormal cell
>-
Footer cellFooter cell
}

Nux: I'd say that it's better to do it on the existing syntax, since I can't see the use case of having a row that looks like a thead but structurally isn't.

Fixed in r85922

michael wrote:

So when the patch is implement, what syntax would I use to divide a table into two or more row groups using tbody elements? This is not clear from the descriptions above.

andy wrote:

Can we get an update, please?

Nobody is currently working on this.

I think this proposal needs a clearer description of use cases, and why those use cases justify the complexity costs in:

  • the wikitext user interface
  • the VisualEditor user interface
  • Parsoid

As an example, how would this be sensibly presented in VE?

andy wrote:

Gabriel: The use case is out lined in Michael Zajac's initial post (timestamp: 2006-01-24 02:41:52); and in comments 4 & 11. Do you have questions about those?

It appeared from comment 15 that this was resolved four years ago; no reason for its reversion has been given here.

andy wrote:

Also, the heuristic suggested above won't work, as it's necessary to allow for more than one tbody per table.

(In reply to Andy Mabbett from comment #24)

Gabriel: The use case is out lined in Michael Zajac's initial post
(timestamp: 2006-01-24 02:41:52); and in comments 4 & 11. Do you have
questions about those?

What I see there is

  1. allows for more sophisticated formatting (comment 1)
  2. sorttables not sorting thead / tfoot (comment 4)
  3. facilitation of microformats (comment 11)

Are 1) and 2) actually still issues? To me it sounds like 2) would only be an issue with a footer, which is relatively rare. Otherwise, detecting a row with <th> elements should not be hard in a script.

  1. Is rather nebulous given that you can just as well attach classes to trs.

I am asking for is a clear use case. I want to do X, it's not possible because of Y, and it will be possible once thead / tbody / tfoot are supported. This is worth the costs because of Z.

A related use case: allows Parsoid to handle arbitrary table markup in WTS phase.

Although my proposal (for the record) would be *not* to add new pipes-and-punctuation markup for <thead> <tfoot> etc, but instead to just allow them to be generated by literal HTML embedded in wikitext, eg https://en.wikipedia.org/wiki/Help:Table#Other_table_syntax

Once your table is sufficiently complicated, it's probably best to use literal HTML, IMO. But we still need to permit thead/tfoot/colgroup etc in literal HTML within wikitext.

michael wrote:

(In reply to Gabriel Wicke from comment #26)

  1. allows for more sophisticated formatting (comment 1)

The main reason I requested this is the ability for an editor to create multiple row groups by adding multiple tbody elements in a table. This would allow grouping data in tables, making these groups accessible to assistive devices like screen readers, allow visual formatting of the groups with CSS (other than redundant inline CSS), and allowing behaviours like collapsing groups.

The solution in comment 1 simply automates adding a whole-table tbody element, and does not satisfy the requirement (the HTML DOM implicitly includes a full-table tbody anyway, so this solution is redundant.)

Some use-case examples that would benefit from this:

michael wrote:

(In reply to C. Scott Ananian from comment #27)

Once your table is sufficiently complicated, it's probably best to use
literal HTML, IMO.

But grouping table rows is a very simple concept.

There is high demand. Editors are already attempting to do this in tens of thousands of tables using complex, inconsistent, inaccessible, inadequate, and inappropriate hacks (rows of table headers, horizontal rules, inline CSS, nested tables, etc.).

It should be possible to accomplish this with dead-simple wikitext, and visually format it consistently and automatically in standard style sheets.

@Michael Zajac: nothing related to table parsing in wikitext is simple, unfortunately. So I'm suggesting to concentrate on *making it possible*, and let the template authors and/or VE, etc, worry about making it "dead-simple".

But I'm open to suggestions. The HTML elements not currently supported in wikitext are thead, tbody, tfoot, colgroup, and col. If someone would like to open a new wikipage proposing concrete "dead-simple" wikitext syntax for these, I'd be happy to re-evaluate. (But please make your proposal on a wikipage, so that this bugzilla isn't bloated out with endless bikeshedding over tweaks to the syntax.)

Note that the original page was reverted (as I understand the history of this bug) because the implementation constructed an entire in-model memory of the table during processing. Wikipedia tables can be *huge*. So any syntax proposal must be able to be parsed without buffering and using as little table context information as possible. Similarly, you should be prepared to demonstrate (using greps over a wikipedia dump, or similar) that the proposed syntax does not break any existing table markup.

michael wrote:

@C. Scott Ananian I do appreciate that the parsing and programming are likely very complex. And also that white-flagging the HTML is a good improvement and probably a step towards creating a wikitext syntax for these elements.

But wikitable syntax is fairly simple for editors to use, and I hope that these efforts can eventually add a simple way to mark the start of a new tbody (row group), and the other elements. I’m sorry that currently I can’t invest time in this, but thanks for the suggestions on how to proceed.

(In reply to C. Scott Ananian from comment #30)

If someone would like
to open a new wikipage proposing concrete "dead-simple" wikitext syntax for
these, I'd be happy to re-evaluate. (But please make your proposal on a
wikipage, so that this bugzilla isn't bloated out with endless bikeshedding
over tweaks to the syntax.)

I note the correct place for such a proposal would be https://www.mediawiki.org/wiki/Requests_for_comment

@Brad -- yes, I thought about mentioning that, but reconsidered; I thought it would probably be more useful to stage a draft in some user's talk space (or similar) first and let people hack on it for a while, before making things formal and hoisting the text into the RfC namespace. I didn't want to discourage contributors by forcing the RfC template and formatting on them right away.

But: if you're not afraid of extra process and formatting and are feeling confident in your proposal, then sure throw it directly into RfC space.

Some (fixable) template bugs appear to have been introduced as a result of the introduction of automatically generated <thead> and <tbody> tags. See https://en.wikipedia.org/wiki/Template_talk:Articles_by_Quality_and_Importance#rowspan_and_thead_bug

See https://en.wikipedia.org/wiki/Template_talk:Articles_by_Quality_and_Importance#rowspan_and_thead_bug

In that talk @TheDJ points out that jquery.tablesorter sets thead/tfoot automatically (based on th elements). So, as an alternative to suggested manual wikitable syntax, what about setting thead/tfoot consistently, i.e. also for unsortable tabels and without JavaScript?

Task T5156 was marked as duplicate of this in 2010 (T5156#74974). However, that task was proposing to allow thead and tbody in wikitext, the same way we also allow <table>, <tr> and <td> already. Thus essentially opt-ing out of our own wikitext syntax.

That proposal was merged into here, where the conversation has mainly revolved around what custom syntax to use. I propose to shelf that in favour of doing what Brion proposed there in 2009 already:

In T5156#74956, @brion wrote:

We could toss thead, tbody, and tfoot into the table whitelist in Sanitizer::removeHTMLtags... to do it right one would need to expend extra effort to ensure nesting is correct, though. (Or else leave it to Tidy...)

And later by others at T26274: Allow Sanitizer to process tbody.

I think a custom syntax might have merit, but there's all sorts of compatibility and usability things to keep in mind there. Either way, we would need to allow it in the Sanitizer, and for use in complex templates we'd want the HTML-like syntax to have feature parity, so I propose to first cut through this and unlock the primitive. Then the conversation about custom syntax can continue at its own pace as an improvement, rather than as a blocker. (possibly on a separate task, or we can unmerge them and re-open T5156 and address that first).

Jdlrobson raised the priority of this task from Low to Medium.Oct 26 2022, 8:14 PM
Jdlrobson added a subscriber: Jdlrobson.

Hi @cscott this is now causing some issues in desktop improvements as certain articles use sticky positioning for th rows (more background in T289817#8225410). Vector would like to offet the thead instead. How can we make using thead in templates possible?

We cannot make syntax changes right now. We should probably have a conversation to see how we can move this forward. @JMcLeod_WMF, can you please handle this?

In T6740#90295, @cscott wrote:

A related use case: allows Parsoid to handle arbitrary table markup in WTS phase.

Although my proposal (for the record) would be *not* to add new pipes-and-punctuation markup for <thead> <tfoot> etc, but instead to just allow them to be generated by literal HTML embedded in wikitext, eg https://en.wikipedia.org/wiki/Help:Table#Other_table_syntax

Once your table is sufficiently complicated, it's probably best to use literal HTML, IMO. But we still need to permit thead/tfoot/colgroup etc in literal HTML within wikitext.

Re-upping my suggestion from 8 years ago.

And in an effort to scope this task, the desktop refresh work AFAICT *only* wants to be able to put the <TH> cells inside a <THEAD>. This could be a stop gap hack eg in the sanitizer ( look for table first row containing only TH, wrap it in THEAD), we don't necessarily need to boil the ocean all at once.

Technically, I think they'd want what tablesorter does. Find the first TR rows that only contain headercells and put those rows inside thead.
https://github.com/wikimedia/mediawiki/blob/master/resources/src/jquery.tablesorter/jquery.tablesorter.js#L277

@cscott, @ssastry and myself met to talk about this. We agree this is a missing feature in wikitext syntax and want to put this on the parser roadmap.

Fixing this in the legacy parser is likely to be risky and we don't think theres a quick solution here, right now.
For the sticky header bug, I'll create a new ticket referencing this one to explore short term workarounds.