Page MenuHomePhabricator

Allow use of semantic HTML5 elements in wikitext
Open, LowPublic

Assigned To
None
Authored By
bzimport
Jun 12 2010, 5:23 PM
Referenced Files
None
Tokens
"Love" token, awarded by geraki."Love" token, awarded by ToBeFree."Love" token, awarded by Liuxinyu970226."Love" token, awarded by Danny_B."Love" token, awarded by Volker_E."Love" token, awarded by Ricordisamoa."The World Burns" token, awarded by Bennylin."Love" token, awarded by Kulla.

Description

Many of these tags are a natural compliment or enhancement to the structure of Wikipedia's and Wiktionaries. Levels of support:

  • Whitelist, to allow use in wikitext.
  • Add HTML5 elements to wikitext rendering.

References

Support status

iconmeaning
handled or overridden by MediaWiki core
handled or overridden by MediaWiki extension
enabled
enablable - semantic markup
enablable - tables enhancement
enablable - needs some work though
enablable - form control without interaction, for semantic markup
tag?notes
<a>via discussion with @tstarling doable (in favour of enabling various relevant attributes rather than expanding the current [[..|..]] syntax); T35886: to support for microdata and rdfa, allow <a> tags so external links can have ref/rel attributes see also RelMicroformat
<address>old HTML spec, not a new feature (T2671: Whitelist non-problematic HTML tags: address, especially later discussion)
<area>, <map>handled by ImageMap (<imagemap>)
<article>
<aside>T104770: Add HTML5 <aside> to the parser whitelist
<col>, <colgroup>old HTML spec, not a new feature; T2986: [tables] Please implement COL, COLGROUP, T322775: Allow <thead> <tbody> <tfoot> as literal HTML tags in Wikitext
<details>, <summary>T31118: Add HTML 5 semantic elements 'details' and 'summary' to Sanitizer whitelist; See T31118#8015894
<fieldset>old HTML spec, not a new feature; with <legend>
<figcaption>Part of T118517: [RFC] Use <figure> for media
<figure>Part of T118517: [RFC] Use <figure> for media
<footer>
<header>
<legend>old HTML spec, not a new feature; with <fieldset>
<link>Emitted by Parsoid
<main>
<meta>Emitted by Parsoid
<meter>T211259: Allow use of <meter> element in wikitext
<nav>
<progress>fallbackable [ 1, 2 ] to its content: <p>Progress: <progress id="p" max=100><span>0</span>%</progress></p>
<section>handled by MediaWiki-extensions-LabeledSectionTransclusion T32597: <section> tag name collides with HTML5 <section> tag; Parsoid emits <section> tags
<source>T39042: Remove <source> syntax from SyntaxHighlight (GeSHi)
<style>TemplateStyles . See also T52644: Support <style scoped> as HTML element in wiki source, T37704: Drop support in wikitext for inline styles
<tbody>, <tfoot>, <thead>old HTML spec, not a new feature; T6740: thead, tbody, tfoot for wikitable syntax T5156: Request not to filter <tbody> and </tbody> codes T322775: Allow <thead> <tbody> <tfoot> as literal HTML tags in Wikitext
iconmeaning
invalid (aka not a part of <body>)
disabled for security reasons
disabled for security reasons (scripting)
disabled for security reasons (interactive form control)

Security implications

tag?alternativesother notes
<audio>[[File:]] syntaxEmitted as part of T118517: [RFC] Use <figure> for media
<base>
<body>
<button>MediaWiki-extensions-InputBox
<canvas>
<datalist>
<embed>T18316: Tags like <embed> are needed
<form>
<head>
<html>
<iframe>T18316: Tags like <embed> are needed
<img>[[File:]] syntax, <gallery>
<input>MediaWiki-extensions-InputBox
<keygen>deprecated now (see https://developer.mozilla.org/en-US/docs/Web/HTML/Element/keygen)
<label>MediaWiki-extensions-InputBox
<noscript>T47731: Allow <noscript> tag
<object>T18316: Tags like <embed> are needed
<optgroup>
<option>MediaWiki-extensions-InputBox
<output>
<param>
<picture>See discussion
<script>
<select>
<template>@cscott notes this is an option to use to represent "currently invisible DOM trees", for example captions on an image which is currently displayed inline
<textarea>
<title>overridable by {{DISPLAYTITLE:}}
<track>[[File:]] syntax + TimedMediaHandler
<video>[[File:]] syntaxEmitted as part of T118517: [RFC] Use <figure> for media

Whitelisted for editor use:

<abbr>, <b>, <bdi>, <bdo>, <blockquote>, <br>, <caption>, <cite>, <code>, <data>, <dd>, <del>, <dfn>, <div>, <dl>, <dt>, <em>, <h1>, <h2>, <h3>, <h4>, <h5>, <h6>, <hr>, <i>, <ins>, <kbd>, <li>, <mark>, <ol>, <p>, <pre>, <q>, <rb>, <rp>, <rt>, <rtc>, <ruby>, <s>, <samp>, <small>, <span>, <strong>, <sub>, <sup>, <table>, <td>, <th>, <time>, <tr>, <u>, <ul>, <var>, <wbr>

Details

Reference
bz23932

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Is there anything holding this off now that T147199 is happening?

Re: T147199 , you'll probably want to re-evaluate UA stats after Nov 17 to see what the true final fallout is. It *should* significantly reduce the population of certain ancient UAs (notably, IE7-8/XP), but there are a few ways such UAs can stick around as well, and we can't readily predict what the numbers will look like:

  1. There's a complex hack (too complex and off the beaten path for us to recommend it to users, anyways) where one can get better-than-3DES crypto with IE8/XP by manually applying some registry hack to convince MS update servers that it's the POSReady commercial variant of XP, which got a longer support lifetime and some crypto DLL updates. Users who go down this road might use IE8/XP longer than others (though I can't fathom why they'd make this choice, it's still horribly insecure in all other senses).
  2. Some IE7-8/XP users might sit behind TLS-intercepting proxies which upgrade their outbound crypto. E.g. there might be software on the host, or a network appliance, which accepts their crappy 3DES TLS connection, uses a fake root installed on the host to enable TLS proxying in general, and then does better crypto on the outbound side facing our servers.
  3. IE7 (and 8 presumably?) also exists for Windows Vista, and IE7/Vista is not shut out by our crypto changes, as it supports TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA via TLSv1.0. This isn't the best crypto option in the world, but it's good enough that it's not going away this year. However, it will probably drop off significantly from whatever its current popularity level is by mid-2018, and depending on the drop in popularity, we may or may not shut it out at the crypto level when the stat drops off sufficiently. The driver here is that by mid-2018 (in many cases, it might get done earlier) all sites that accept credit cards have to eliminate TLSv1.0 to keep up with PCI-DSS standards. PCI-DSS isn't a requirement for our wikis, but it's expected that the lack of compatibility with most of the rest of the commercial internet will drive straggling users away from these older UAs for us.

you'll probably want to re-evaluate UA stats after Nov 17 to see what the true final fallout is

If I'm reading https://analytics.wikimedia.org/dashboards/browsers/#all-sites-by-browser/browser-family-and-major-tabular-view correctly, then IE8 is at 0.5%. If we assume that 1% of those have disabled JavaScript, that makes it 0.005% of all users, or five in ten thousand. Is that good enough to start whitelisting a few HTML5 elements in wikitext?

you'll probably want to re-evaluate UA stats after Nov 17 to see what the true final fallout is

If I'm reading https://analytics.wikimedia.org/dashboards/browsers/#all-sites-by-browser/browser-family-and-major-tabular-view correctly, then IE8 is at 0.5%. If we assume that 1% of those have disabled JavaScript, that makes it 0.005% of all users, or five in ten thousand. Is that good enough to start whitelisting a few HTML5 elements in wikitext?

Wait a minute, 0.005% isn't "five in ten thousand". It's five in a hundred thousand, isn't it?

Parsoid uses <section>, <figcaption>, and <figure> already in its output (and thus the main parser will too, as Parsoid is merged into core). <picture> could be considered as part of media layout, but we are using other better-supported mechanisms for responsive images at the moment. These tags should not be whitelisted in article content as they conflict with wikitext features.

A number of the remaining tags are related to overall page organization; they should be emitted by the theme, but should not be whitelisted in article content as such use would conflict with the <section> tags already emitted and with the document structure emitted by the theme: <footer>, <header>, <article>, <aside>, <main>, and <nav>.

I have no issue whitelisting the rest: <mark>, <progress>, and <time>. (Of these, <mark> seems the most obviously useful.)

I have no issue whitelisting the rest: <mark>, <progress>, and <time>. (Of these, <mark> seems the most obviously useful.)

What about <meter>, <details> and <summary>?

Krinkle renamed this task from Enable, whitelist, and incorporate semantic HTML5 elements to Allow use of semantic HTML5 elements in wikitext.Mar 19 2020, 8:52 PM

An element like main should not be enabled in wikitext, but there is no option beyond disabling for security reasons.

I've pulled some stuff out that has clear editor use and which really doesn't need a separate way to Do It.

[snip]

As I said at T104770#6685087, there are valid uses for the wikitext user, in fact, for many of those you put in the "never for wikitext users" buckets. It's not sufficient to say "you can't have this" based on your opinion that they're 'overall page organization' or for 'media' necessary, when in fact they are not, based on the specification:

  • aside -> Sidebar, side box, quote box, etc. (And I'm pretty sure I could sell the HTML-cognizant folks onwiki that infoboxes fall under this)
  • article -> Main page, portals, talk page comments, Template:Excerpt, news reels a la Wikinews, Wikisource in general, etc.
  • figcaption and figure -> Infoboxes, Module:Gallery, Template:Video game reviews, other table uses in the general where caption is insufficient for w/e reason, and this typical use for giving lists headings.
    • (Nb Module:Gallery shouldn't exist, but Wikimedia will always lag users in being able to provide options for display.)
  • header -> Main page, portals particularly inside article, see MDN.
  • footer -> Same places as header
  • section -> Same places as header
  • nav -> Any navbox
  • picture - This one isn't just for responsiveness. Anyway, I have trouble with this one envisioning use cases for WMF wikis, keeping in mind how editors interact with images today ([[File:]]). The wikitext would need something like a |fallback_n parameter in [[File:]]? (The reason I'm on this task today is because there are 3rd parties that have a use case for this. I'm not one of them and not sure I can describe the use case in question. Saw it on the Discord.)
  • source - Similar to picture.

It's fine if these never have a wikitext implementation; I can't imagine the casual wikitext editor will want most of these. The predominant use is for template/module editors to mark up template-generated content.

For sake of completeness:

  • main is not editor-feasible (and moreover has a requirement in the spec of "1-only" per HTML document)
  • mark and time were implemented
  • progress probably needs a task for implementation, since it's not under dispute.

Eyeballing the rest, I don't see anything else that needs comment from an editor perspective.

Izno updated the task description. (Show Details)

Update re <figure> and <figcaption>:

HTML standard no longer allows an inline attribution inside <blockquote>.

Invalid:

<blockquote>
<p>Attribution for the quotation, if any, must be placed outside the blockquote element.</p>
<cite>HTML Living Standard</cite>
</blockquote>

<figure> and <figcaption> are the only way to semantically associate an inline attribution with a blockquote.

Valid:

<figure>
<blockquote>
<p>Attribution for the quotation, if any, must be placed outside the blockquote element.</p>
</blockquote>
<figcaption>
<cite>HTML Living Standard</cite>
</figcaption>
</figure>

As long as <figure> and <figcaption> remain blacklisted, inline attributions have to be separate, with no semantic association to the quote.

Valid, but no semantic association:

<blockquote>
<p>Attribution for the quotation, if any, must be placed outside the blockquote element.</p>
</blockquote>
<p>
<cite>HTML Living Standard</cite>
</p>

I added the 'enablable - semantic markup' icon / status for the <details> <summary> tags because I believe this query on Code Search doesn't show any concerns. The follow-up would be https://phabricator.wikimedia.org/T31118#6989769 suggesting the use of <details> <summary> in place of using mw-collapsible.

@cscott I don't think it's appropriate to put your opinion in the table above when I have contested it here.

@cscott I don't think it's appropriate to put your opinion in the table above when I have contested it here.

Sorry, I don't think I saw your comment down here when I was editing up there. I was just trying to update the table to reflect current thinking of the parser team. You seem to removed those updates without putting them anywhere else, which I don't think helps the discussion. It's fine to say that there are differences of opinion on certain elements, and indeed I was very careful to note which aspects were my opinion only. I still think it is worthwhile to have a comprehensive summary table which outlines the points of disagreements and current status.

For the record, the specific elements whose status @lzno changed in the task description were <article>, <aside>, <footer> and <header>. The description I wrote was @cscott says "part of the skin content not article contents", and I stand by that.

The header and footer of the page should be controlled by the skin. There may be some wikitext mechanism to add content to those areas, but I am dubious that mechanism will be literal HTML5 elements in wikitext. It is more likely to be some sort of parser function.

<aside> is an outlier, but here too it seems more natural to add it as part of the figure styling options than to expose it directly as an HTML5 tag.
Simil

<figure> and <figcaption> are the only way to semantically associate an inline attribution with a blockquote.

This may be so, but that in itself is no reason why the literal <figure> tag needs to be supported in wikitext. Wikitext is not HTML. Similarly, the wikitext <blockquote> doesn't have to literally emit <blockquote> and only <blockquote>. We can add <figure> wrappers to <blockquote> (if that's what's needed) without adding <figure> as wikitext.

In general this task is a bit odd, in that it presupposes that wikitext should be a strict superset of HTML5, aka any valid HTML5 is also valid wikitext. I don't believe that is correct, or even desirable.

Indeed:

My general opinion is that all HTML elements should be whitelisted barring existing conflicts (section particularly today) or security concerns. Especially the ones which have use cases.

I think anyone who has tried to deal with <li> and <td> tags in the current parser will understand better why "all HTML elements should be whitelisted" is not a great idea. Mixing HTML and wikitext semantics results in a mess.

@cscott I don't think it's appropriate to put your opinion in the table above when I have contested it here.

Sorry, I don't think I saw your comment down here when I was editing up there. I was just trying to update the table to reflect current thinking of the parser team. You seem to removed those updates without putting them anywhere else, which I don't think helps the discussion.

I think it makes sense for the person who attempted to add them to the task to put forth the opinion in the form of a comment actually summarizing and supporting their opinion, which is why I pinged you when I removed them. That's pretty common in how wiki world works. You boldly made an addition, I reverted and started a discussion about your attempted change. (See WP:BRD.)

For the record, the specific elements whose status @lzno changed in the task description were <article>, <aside>, <footer> and <header>. The description I wrote was @cscott says "part of the skin content not article contents", and I stand by that.

Interstitial note to the point just below: the change itself was retained by Phabricator which thankfully has non-0 history tracking. You added more than what you say you added. (I say this strictly as a reminder of the functionality and not in intent to be passive aggressively noting that you added more than that [which is not itself a passive aggressive comment... sigh].)

<aside> is an outlier, but here too it seems more natural to add it as part of the figure styling options than to expose it directly as an HTML5 tag.
This may be so, but that in itself is no reason why the literal <figure> tag needs to be supported in wikitext. Wikitext is not HTML. Similarly, the wikitext <blockquote> doesn't have to literally emit <blockquote> and only <blockquote>. We can add <figure> wrappers to <blockquote> (if that's what's needed) without adding <figure> as wikitext.

Please engage with the existing use cases I have personally already pointed to, and discuss the on-the-ground facts of editing that I've already provided you. If you need links to understand where, why, and how each of those cases is supported today and how it should be supported tomorrow (for values of tomorrow relevant to soon but not Sooner or Later), I'm happy to file tasks to further discuss them. But maybe you'll get a gist as an example of the discussion for that, since you've already commented here:

We can add <figure> wrappers to <blockquote> (if that's what's needed) without adding <figure> as wikitext.

This causes an impedance mismatch for users who know and care about both (see more below), but even if it didn't, I am not sure you fully understood the meaning behind what Matt said.

Matt's comment is that today:

<blockquote>ABC</blockquote>

with attribution (which previously the specification allowed inline to the blockquote a la <blockquote>Quote <cite>Title</cite></blockquote>) requires a structure like this:

<figure>
<blockquote>Lorem ipsum</blockquote>
<figcaption>A man from 500 years ago. "Title from long ago".</figcaption>
</figure>

Such a case would be supported today by a template like Template:Quote (and is almost verbatim that, just swap out the divs in use today).

But this doesn't help the non-attributed version, such as you might see in the wild, and so you would output wrong or extra HTML:

And lo, the man said:<ref>ABC</ref>
<blockquote>Lorem ipsum</blockquote>

Which means now we need a parameter to actually support the case Matt has noted. Let's assume for a minute that you have bandwidth to support adding a parameter to <blockquote>, one that doesn't look like the HTML (which has cite= I believe), say, ref="ref". Now you need to do the work to support wikitext going inside that parameter, something like so:

<blockquote ref="{{cite news|title=Title from long ago|author=A man from 500 years ago}}">Lorem ipsum</blockquote>

outputting the text in the highlightered-HTML block above.

It should be obvious this is going nowhere good, fast. The suggested wikitext might be possible and even supported in a different system today and would "just" need copy-pasting. But in my suggested world we instead wrap it behind a template abstraction (that we already have) with access to <figure>, which already supports the notion of arbitrary wikitext in parameters, and life is happy. In a world where we have access to <figure> in wikitext.

The header and footer of the page should be controlled by the skin.

[Emphasis added] Undeniably. You have not engaged with what the specification allows for, but I guess more on that below.

In general this task is a bit odd, in that it presupposes that wikitext should be a strict superset of HTML5, aka any valid HTML5 is also valid wikitext. I don't believe that is correct, or even desirable.
Wikitext is not HTML.

We don't want it to be. You'll note that in my rejigger of this task a year and a half ago I put all the red things in a specific box, just so it was obvious that there are Clearly Some Things That Wikitext Should Never Have Access To as raw wikitext, and so it will never be a super set.

What we do want, as I have already said, is to allow our template editors (primarily but not exclusively; portals editors are the second major group that I can trivially point to here, but they'll ultimately be supported by template editors) to support all the things one might want to do with safe HTML. The WMF has a bad history of being able to support even the prominent (!!!) use cases (time, money, allocation of people, etc. etc. etc.). We have an entire Lua module to support extending just the <gallery> tag. We have no general infobox module that we can directly access in Lua or via a keyword. (And the one of those that someone tried, they failed to support English Wikipedia's use cases for template:infobox much less get it deployed for I think unrelated reasons; see Capiunto effort c. 2014.) No navbox module that we can directly access, even though it could maybe even help solve T124168 and return navboxes to mobile. And so on, and so on, that might lead us to the promised land of wikitext 2.0. And given the issues raised by the community this year, the WMF may never have sufficient funding to support modules of those sorts ever again. (That's a scary thought, yeah? :) (NB this is not me being supportive of the way in which those issues were raised.)

So then it falls on volunteers, either at the software or the wiki levels, to support the use cases we come up with. In contrast to the above, providing these HTML tags today allows us to take a load off you and is, fundamentally, easy. We (you, me, and everyone else interested in certain tags) may need to scrub some of the assumptions that may have been made about HTML elements in the lands of JS and CSS (and/or whatever Google consumes with their pipeline I've seen reference to but no one has really ever specified in public that I know of), but based on what I've seen at least your designer (Volker) and one or two of the other skin-side people say on the point, you aren't and shouldn't be using bare tag selectors anyway. Regardless, you pay that cost one-time and we maintain it for you, for the foreseeable future until the WMF (or whoever maintains MediaWiki in arbitrary era), can actually support it in the software. Because based on the speed that the WMF moves, that won't be for literal decades. Even if WMF weren't fundamentally slow, there are more important things for you (your organization, your working group, you specifically) to work on at any given point of time than "oh, is today the day we support <figure> as an implicit part of <blockquote>" or "is today the day we support <listfigure> as non-HTML support for lists needing captions (lists with a figure and figcaption surrounding them, so, uh, why not just use the tags of interest directly?)" or "is today the day we support navboxes and sidebars (<nav>) in the core software" or "is today the day we support the snippets of text that display on the main page at Did You Know and On This Day and In the News and even Today's Featured Article with <article>". All of those examples are in T25932#7070297, just to be clear: they aren't hypotheticals but real places that we could enrich what is provided by Wikipedia, today.

Wiki wiki means fast. We wanted these tags yesterday, and we can do the updates tomorrow to use them (plus the lagtime between now and however long it takes to get through +2ing with all the necessary review/adjustments of potential impacts plus the time until the following deployment). All it takes is putting them on the allowed list of tags. How long will it take you (general you, but you can take it as a personal challenge to estimate the programming work needed just to support en.wp/Module:Navbox in core/an extension if you want), being paid to do the work, to understand and make the changes you think you might want to make to support those tags in all the places they would feasibly be used? And then maintain that support, indefinitely?

I don't believe that is correct, or even desirable.

To comment on this a second time, you still haven't said why. You have said "it should be this way" but not supported it with any rationale whatsoever. No one can fruitfully engage with that.

I can only assume you're worried about complexity. I can't honestly think of another valid reason to object to those (see above use cases supported by the documentation available with the specification and MDN). I could make some ad hominems about other stuff I've seen specifically you suggest and support on what you believe to be good and fruitful additions to wikitext or the template editing or the Lua environments that would increase the complexity by more, not less, but I won't go there. It's bad faith, never mind a fallacy for this discussion.

I'll simply say that concern's functionally irrelevant. Templates exist. They are used by The People. They are a fundamental concept of abstraction for wikitext. Newbies learn what they do in the first day. Even if they get dropped into the land of VisualEditor and never see wikitext in their editing lifetime that looks like {{quote|attribution=A man 500 years ago}}. Then you're just worrying about the people who make templates. But we all also already learn how HTML works today because it's a matter of reality. We all already have structures that do the things you want us not to do. To me it is then solely a matter of marking them up as such, for those reasons why you should use semantic HTML (support for accessibility and that juicy SEO; while Wikipedia is dominant in the latter, not all the fleet's wikis are, never mind the 3rd party wikis, even ignoring that I'm pretty sure $searchEngine would love to see more semantic HTML too regardless).

Would I like some templates not to exist? Yeah, sure. Do I want to wait a decade for the WMF to make them not exist? No, I think not. (A decade may be hyperbolic, but consider that's how long it took from spinning up 3rd Phase MediaWiki to the time we got Lua; it was another 5 years before TemplateStyles was released, and there have been no strong feature changes particularly supporting templates since, so I'm probably not far off the mark for the subsumption of some pieces of things wiki editors have dreamed up into the core software, for not-on-wiki values of core. Maybe the closest thing is the work that SD0001 has done with Gadgets potentially creating some wikitext magic word to invoke a gadget on a specific page. That's a veritable revolution compared to the changes requested here.)

Concluding remarks (me abusing bold for a heading, somewhat ironically as we're talking about semantic HTML here)

At the end of the day, I also wish to stress one thing: I immensely value the work your group does. I appreciate it's difficult work (specifically, parsing the soup sanely). I don't want you to have to do more. This task, and particularly the tags at issue, is one way to help me lighten your burden at what I estimate will be functionally 0 cost at the end of the day. This task is one of the easiest and quickest win-wins I can think of in Wikimedia development today.

Empower your volunteers.

(Just for shits and giggles, that's 1800 words of my text and 300 words of quotations, and I don't like writing essays. :( )