Page MenuHomePhabricator

Incrementally remove support for HTML elements removed from or deprecated in HTML5
Open, LowPublic

Description

Author: smccandlish

Description:
Support for the HTML elements tt, s, strike and u should be removed. Initially, they should be deprecated in the MW major wiki (Wikipedia, Wiktionary) documentation. Second, they should be replaced on the fly with font-styled spans by the MW engine before reaching the user agent. Third, they should eventually not be supported at all, after wider adoption of HTML5.

Previous discussion on the topic (with new material added at end), centralized here:

  • Bug #671 Comment #27 from SMcCandlish <smccandlish@gmail.com> 2008-09-19 21:34:22 UTC ---

We should also be aware that the tt, s, strike, and u elements are also going to be removed from XHTML 2 and HTML 5. The time is probably NOW to start weaning people off of them, though of course they shouldn't simply be deleted from MediaWiki support just yet. It would be good if these were replaced on-the-fly with styled <span>s, though (meanwhile, <i>/'' and <b>/''' should be left alone, as HTML 5 redefines them more narrowly and they will continue to be
used).

  • Bug #671 Comment #41 from SMcCandlish <smccandlish@gmail.com> 2010-07-22 01:29:44 UTC ---

[We] will ultimately need to ... get rid of support for the tt element entirely, which doesn't exist in HTML 5. Here's a good discussion of this issue (more broadly than wiki), and Googling about it turns up more:

http://lists.whatwg.org/htdig.cgi/help-whatwg.org/2009-April/000233.html

Salient quote:

Ian Hickson, Wednesday, 29 April 2009 6:44 AM:

On Tue, 28 Apr 2009, Jim Garrison wrote:

I am trying to figure out the best way to replace the tt element as I
migrate to HTML5.

Are you using tt to mark up computer code, variables, sample computer
output, user input, for emphasis, to give a span of text in an alternate
voice or mood, a span of text to be stylistically offset from the normal
prose without conveying any extra importance, or something else?

This question must be asked every time a tt element is replaced (manually or
via AWB or whatever), and they WILL need to ultimately be replaced over the next
couple of years.

  • Bug #671 Comment #43 here from Aryeh Gregor <Simetrical+wikibugs@gmail.com> 2010-07-22 17:24:10 UTC (In reply to comment #41) ---

[The tt element] exists in HTML5. It's just classified as obsolete
presentational markup, and is not valid

Noted; thanks. HTML5 keeps changing and I stopped trying to track it all quite some time ago. Being mentioned in the standard as *invalid* presentational markup is effectively the same thing as being "not in" the standard, however. And it doesn't change my point about the tt element: The last thing we want is for WP and other wikis' content, whether served by those wikis or repurposed elsewhere, to fail validation out of laziness and cruft. MW's tt still has to go, at least in the long run.

We cannot remove
support for [tt] from MediaWiki without a migration path
to convert all existing markup somehow. But this is a totally separate bug.

No argument from me on either observation. My point is that unless this *is* opened as a bug, it's highly unlikely that any such migration path will be devised (although it would be a near-trivial one anyway; a simple bot could convert these into a styled span, ignoring instances inside pre, nowiki and angle brackets coded as numeric or named character entity references. A new bug for this one should not be set to RESOLVED LATER or no one will do anything to start migrating away from the dead markup.

I would suggest that tt be removed from documentation as "supported", and noted as deprecated with all support for it eventually being removed. For several versions (maybe several years) it should be allowed it in wikicode (i.e., in the editing window and in saved wikicode that editors see in the editing window), but transmogrify it on the fly into a monospaced span before it is served to the user agent. After HTML5 is more fully accepted, tt should just disappear.


Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=671

Details

Reference
bz24529

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 11:05 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz24529.
bzimport added a subscriber: Unknown Object (MLST).
bzimport created this task.Jul 24 2010, 4:46 PM

ayg wrote:

I have no objection in principle to migrating away from using these somehow, so I agree that this bug should not be closed. However, there has to be some migration plan that does not force Wikipedia users to do unnecessarily large amounts of work, and someone has to code it up. These are obstacles that I suspect are prohibitive for the foreseeable future. It's conceivable that we could do some auto-translation of <tt> to <span style="font-family:monospace"> and so forth, but that still leaves the invalid markup in the page source, so it's less than ideal.

Note that it's not just elements here, but attributes too. For example, cellspacing="" and cellpadding="" are obsolete and invalid in HTML5 as well. The full list is here:

http://www.whatwg.org/specs/web-apps/current-work/multipage/obsolete.html#non-conforming-features

We don't allow most of those anyway, but there are an awful lot we do currently permit. Some of them (particularly table attributes) cannot be easily, reliably, and automatically converted to use CSS.

smccandlish wrote:

The tt element and other simple cases can be fixed in the wikicode with AWB and other scripting tools and bots, after being fixed in engine to not actually reach the user agent as a tt element, but a span with monospaced font. On installations other than Wikipedia, they'll need to write their own (or adapt WP's) tools, or fix it manually.

I'm not sure that allowing the tt element in the wikicode is a huge deal. We also allow br elements without a closing / in them, we allow the p element without a closing /p tag, etc., and it all gets fixed on the fly before it hits the browser. I think that tt should be removed from the editor-facing documentation, so that new instances of it are not added to the wikicode, willynilly, forever. The help pages on editing should direct users to a {{monospace}} template or something (which would use the span and font-family).

For HTML5-verboten attributes... Yeah, that'll be big fun. I don't have any particular ideas with regard to table stuff especially.

ayg wrote:

(In reply to comment #2)

The tt element and other simple cases can be fixed in the wikicode with AWB and
other scripting tools and bots, after being fixed in engine to not actually
reach the user agent as a tt element, but a span with monospaced font. On
installations other than Wikipedia, they'll need to write their own (or adapt
WP's) tools, or fix it manually.

If people step up who are willing and able to fix all the breakage, I don't mind disabling support in the software. But not before all existing uses are removed, and people commit to fixing any stragglers.

I'm not sure that allowing the tt element in the wikicode is a huge deal. We
also allow br elements without a closing / in them

Which is allowed in HTML5.

we allow the p element without a closing /p tag

Which is allowed in HTML5.

I think that tt should be removed from the editor-facing
documentation, so that new instances of it are not added to the wikicode,
willynilly, forever. The help pages on editing should direct users to a
{{monospace}} template or something (which would use the span and font-family).

I agree, but of course, this isn't the right place to ask. It's a wiki, change it. :) (Or get consensus, whatever.)

For HTML5-verboten attributes... Yeah, that'll be big fun. I don't have any
particular ideas with regard to table stuff especially.

That's much harder, yeah. Of course, it wouldn't be a big deal if people would just not use presentational tables, but good luck with that one . . .

theevilipaddress wrote:

Why bother people with removing them from the wiki text? They're often much easier to write for the casual editor. Just compare

<s>#A striken out vote. ~~~~</s>

to

<span style="text-decoration: strike-through;">#A striken out vote. ~~~~</s>

Or <center> is a very easy way to center a text, <u> is helpful to underline text and the like. We shouldn't force users to learn HTML and CSS for their regular editing. They should remain supportable in the wiki text and the software should convert them to proper HTML 5.

The real concern here should be to move the deprecated HTML elements and attributes outta the software generated text. I've recently requested to replace some <font> attributes in some messages at Translatewiki, and Siebrand then did it, but I'm pretty sure there's more such stuff. I can look for the MediaWiki messages with old CSS, if you want to. There may be some in core, and probably much more in the extensions.

smccandlish wrote:

These concerns are not mutually exclusive, and really are part and parcel of the same thing. The reasons to stop supporting obsolete <tt>-style stuff in wikitext are numerous. The most obvious is that wikimarkup is not HTML. We allow some basic [X]HTML for experienced, geeky users, but there's not guaranteeing we'd do that forever and ever, and supporting BAD markup of this sort is just pointless. Second, it encourages sloppy coding everywhere. Wikipedia is the most, or one of the top three most, popular (depending on stats source; I tend to think that Facebook and GMail have it beat) websites in the world, so what we do actually has influence. We are lending extra "life after death" to dead code. Third, code from WikiMedia projects like WP and Wiktionary can be re-used anywhere by anyone, and we have no control over how that is done. Just pasting stuff is surely pretty common, so bad code from WP is getting out "into the wild". Not a huge concern, but we should at least only support valid markup if we're going to allow HTML at all. Fourth editor convenience is better served by templates. Fourth, no one really expect users to have to enter stuff like <span style="text-decoration: strike-through;">...<span>, when something like {{strike|...}} would do this for them. And <s> a.k.a. <strike> is a bad example anwyay; it's pure presentation with no semantic meaning, thus its obsolescence. The <del>...</del> markup, is still valid (I'll eventually ensure that {{del|...}} works at en.wp, too). Fifth, we shouldn't force users to learn INCORRECT HTML and CSS for their regular editing, which is presently precisely the case. Implemented the correct stuff, removed the bad, and give non-HTMLish users templates.

The latest HTML spec obsoletes these elements that are allowed by sanitizer.php:
<big>
<center>
<font>
<rb>
<strike>
<tt>

These elements are supported by HTML5:
<s>
<u>

smccandlish wrote:

The <small> element is missing from the list of obsoletes posted by Gadget850 in Comment #6, 2013-05-12 21:54:27 UTC.

(In reply to comment #7)

The <small> element is missing from the list of obsoletes posted by Gadget850
in Comment #6, 2013-05-12 21:54:27 UTC.

That's cause <small> is still valid, was never removed, and hence is not obsolete:
http://www.whatwg.org/html/text-level-semantics.html#the-small-element

<small> keeps coming up in discussions as being obsolete- anyone know why? Perhaps a draft spec? The only change is that is now has a semantic definition.

michael wrote:

Small was obsoleted as a presentation element, and has since been reprieved.

The Parsoid project is sponsoring a GSoC student to write "linttrap", a wikitext linter which can (hopefully) aid in the semi-automatic conversion of deprecated markup.

May need to pick up the pace on this, mobile browsers are starting to drop these elements as can be seen in a simulated screenshot of what I see on my BlackBerry phone: [[:commons:File:Bad elements.png]]. I've been going through offering replacement signatures to those with the tags that have been removed and started cleaning up interface messages, templates, and help/project pages on enwp, but there are nearly 100K pages overall with these codes that render parts of pages invisible. I'll keep plugging away at it, but any help I can get would be greatly appreciated.

@Technical 13 -- it looks like this is actually a font issue with your phone.

(In reply to C. Scott Ananian from comment #13)

@Technical 13 -- it looks like this is actually a font issue with your phone.

Then it is just a coincidence that it affects all of the deprecated elements including <font>, <acronym>, <center>, <big>, <strike>, and <tt> (that I can see) and nothing else?

Your posted image only demonstrates problems with <font> and <tt>. The full list of deprecated elements is at http://www.whatwg.org/specs/web-apps/current-work/multipage/obsolete.html#non-conforming-features -- your phone has problems rendering all of these?

(In reply to C. Scott Ananian from comment #15)

Your posted image only demonstrates problems with <font> and <tt>. The full
list of deprecated elements is at
http://www.whatwg.org/specs/web-apps/current-work/multipage/obsolete.
html#non-conforming-features -- your phone has problems rendering all of
these?

The ones I listed in [#c14], yes.

(In reply to Technical 13 from comment #12)

That's really odd. When a browser drops support for a tag, just as any unknown tag, it should display the contents of the tag (not the tag markup itself), but not hiding its contents!

I'm also inclined to think it's a font problem.

(In reply to Jesús Martínez Novo (Ciencia Al Poder) from comment #17)

(In reply to Technical 13 from comment #12)
That's really odd. When a browser drops support for a tag, just as any
unknown tag, it should display the contents of the tag (not the tag markup
itself), but not hiding its contents!
I'm also inclined to think it's a font problem.

When I view the source of the page, the elements aren't even there.

<rb> is no longer obsolete.
http://www.w3.org/TR/2014/CR-html5-20140429/text-level-semantics.html#the-rb-element

(In reply to Gadget850 from comment #6)

The latest HTML spec obsoletes these elements that are allowed by
sanitizer.php:
<big>
<center>
<font>
<rb>
<strike>
<tt>
These elements are supported by HTML5:
<s>
<u>

<acronym> is not whitelisted, so the markup will always show.

(In reply to Technical 13 from comment #14)

(In reply to C. Scott Ananian from comment #13)

@Technical 13 -- it looks like this is actually a font issue with your phone.

Then it is just a coincidence that it affects all of the deprecated elements
including <font>, <acronym>, <center>, <big>, <strike>, and <tt> (that I can
see) and nothing else?

I customized my CSS so that the five tags in question show with a red background and I have a custom link in the sidebar for HTML validation. This allows me to see the issues and resolve them. I have been doing this for a few weeks now and I see some real problems with this proposal.

Let's consider <center>.

On enwiki we have the {{center}} template which wraps the content in <div class="center" style="width:auto; margin-left:auto; margin-right:auto;">...</div>. Replacing <center> with {{center}} in one swoop would fix the issue, but it won't be clean.

There are a myriad of uses of <center> to center table cell content. Using the template will result in some overly complicated rendered HTML with all those divs. In this instance, <center> should be replaced with |style="text-align: center;" while considering any current cell style.

I have found <center> used to center galleries: <center><gallery>...</gallery></center>
Here the replacement should be <gallery class="center">...</gallery>.

And <center> is used with a number of templates such as <center>{{location map}}</center>
which should be changed to {{location map|float=center}}.

And it is used with full width tables where it should just be removed.

GOIII added a subscriber: GOIII.Nov 27 2014, 7:14 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 23 2015, 11:37 PM
Danny_B added a subscriber: Danny_B.Apr 2 2016, 2:24 PM
Perhelion changed the status of subtask T40487: Remove button for <big> from toolbar from Open to Stalled.Mar 30 2017, 9:11 AM