Deprecate nonstandard behavior of self-closed HTML tags in wikitext.
Open, HighPublic

Description

The HTML5 standard says that the XML-ish self-closed tag syntax <TAGNAME/> (note the trailing slash) is ignored: tags are "self-closed" iff the tag name matches a list of "void tags". The only valid void HTML tags are area, base, br, col, embed, hr, img, input, keygen, link, meta, param, source, track, wbr

As such <b/> and <span/> are treated exactly the same as <b>, <span> in a HTML5 parser. But, the situation is a bit complicated in mediawiki.

  • Without tidy turned on, the Sanitizer mostly enforced this constraint but rewrote <b/> as &lt;b/>. That isn't strictly according to the HTML5 spec (which would rewrite it as <b>) but does get the point across that this is invalid HTML syntax.
  • When tidy is enabled, tidy replaces <b/> with nothing, that is, it removes the invalid tag from the output. This has led to its (ab)use as a way to protect leading/trailing whitespace and punctuation in templates. However, there are alternative ways to do this, including <nowiki/> and &#32;, which don't violate the HTML5 parsing rules.
  • However when we replace Tidy with a HTML5 parser (See T89331), Mediawiki will start enforcing the HTML5 standard and parse <b/>, <span/> as start tags which can break rendering on pages that might (deliberately or accidentally) rely on Tidy removing these tags.

In order to facilitate a smooth migration away from Tidy, we are deprecating the use of non-void self-closed HTML tags (so, to repeat, area, base, br, col, embed, hr, img, input, keygen, link, meta, param, source, track, wbr can be written in self-closed tag form and need not be changed). Additionally, we have started tagging pages using this invalid form with the [[Category:Pages using invalid self-closed HTML tags]] tracking category. Once pages which use this construct are cleaned up, we'll change both the "tidy" and the "no tidy" case to be consistent with the HTML5 parsing standard; that is, <b/> will be transformed into <b>.

Additionally, registered extension tags aren't subject to this consideration since they aren't HTML5 tags. So, for example, <ref /> and <references /> can continue to be used.

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes
Dvorapa added a subscriber: Dvorapa.EditedMay 16 2016, 8:52 PM

What about <nowiki />? On cswiki it is used as a parser workaround e.g. for breaking signature in templates, which insert it (~~~~ -> ~~<nowiki />~~) or when it is needed to fill in a list into a template parameter. Is this going to be fixed too? (There are more workarounds, I ask just for sure)

<nowiki /> is a wikitext construct and is handled correctly. This ticket is only about raw HTML tags. So, there are no concerns with the use of <nowiki /> or registered extension tags.

This comment was removed by Dvorapa.

There's ongoing discussion about fixing these at Enwiki, over at https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Tech_News:_2016-20 if you have any advice for the editors there, on items they should be checking/searching for. I saw (and didn't understand) a "<b/> usage in templates" discussion somewhere recently, and it might be relevant to mention that issue? HTH.

<nowiki /> is a wikitext construct and is handled correctly. This ticket is only about raw HTML tags. So, there are no concerns with the use of <nowiki /> or registered extension tags.

How about HTML tags overriden by MediaWiki or an extension? Like <pre>, <section>, <source>...

Only searching template namespace doesn't show how often the wrong self-closing tags are actually used. According to this search dewiki has more than 500 uses of <span/> in content namespace. Also note that the above search didn't look for uses of <span/> and <div/> without any spaces, which might increase the actual numbers.

TheDJ added a subscriber: TheDJ.May 17 2016, 10:08 AM

@Schnark the search also overcounts in Template space I think. I noticed it doesn't handle <noinclude> statements being intertwined with the tags, which is a common occurrence in template space. A quick tour through the list last week, revealed to me that such no include hacks counted for about half of the hits in template space.

He7d3r added a subscriber: He7d3r.May 17 2016, 12:27 PM
TheDJ edited the task description. (Show Details)May 17 2016, 12:52 PM

<nowiki /> is a wikitext construct and is handled correctly. This ticket is only about raw HTML tags. So, there are no concerns with the use of <nowiki /> or registered extension tags.

How about HTML tags overriden by MediaWiki or an extension? Like <pre>, <section>, <source>...

They won't be affected either since extension tags are handled specially and replaced with extension output earlier in the pipeline.

ssastry moved this task from Backlog to In Progress on the Parsoid board.May 23 2016, 3:10 PM
ssastry triaged this task as "High" priority.May 27 2016, 11:19 PM
Arbnos added a subscriber: Arbnos.Jun 26 2016, 6:34 PM

Change 286928 merged by jenkins-bot:
Add tracking category when editors use the deprecated self-closed tag hack.

https://gerrit.wikimedia.org/r/286928

I have often used <span id="foo" /> to create anchors. What would be the new correct way of doing this?

I have often used <span id="foo" /> to create anchors. What would be the new correct way of doing this?

<span id="foo"></span> should work as well.

I have often used <span id="foo" /> to create anchors. What would be the new correct way of doing this?

<span id="foo"></span> should work as well.

Well firstly, it's a bit annoying to have to do that. But secondly, I understand from the discussion above that Tidy will strip empty tags like this, even though they do have a purpose.

Is <references /> affected too? I mean it is not HTML, but a self-closed tag too.

I have often used <span id="foo" /> to create anchors. What would be the new correct way of doing this?

<span id="foo"></span> should work as well.

Well firstly, it's a bit annoying to have to do that. But secondly, I understand from the discussion above that Tidy will strip empty tags like this, even though they do have a purpose.

No, Tidy won't strip tags with attributes. <span id="foo"></span> vs. <span></span>. The former is left behind and latter will be stripped.

All this is in service of replacing Tidy (T89331) and at that point, we will stop supporting self-closing HTML tags.

We still don't have a firm timeline in place for replacing Tidy, but we are beginning to have tools to support this transition so pages can be fixed before Tidy goes away. For example, https://gerrit.wikimedia.org/r/#/c/286928/ provides a tracking category for pages that use these deprecated tags so they can be fixed.

Is <references /> affected too? I mean it is not HTML, but a self-closed tag too.

No. See T134423#2299131

Wikitiki89 added a comment.EditedJul 14 2016, 8:20 PM

I have often used <span id="foo" /> to create anchors. What would be the new correct way of doing this?

<span id="foo"></span> should work as well.

Well firstly, it's a bit annoying to have to do that. But secondly, I understand from the discussion above that Tidy will strip empty tags like this, even though they do have a purpose.

No, Tidy won't strip tags with attributes. <span id="foo"></span> vs. <span></span>. The former is left behind and latter will be stripped.

All this is in service of replacing Tidy (T89331) and at that point, we will stop supporting self-closing HTML tags.

We still don't have a firm timeline in place for replacing Tidy, but we are beginning to have tools to support this transition so pages can be fixed before Tidy goes away. For example, https://gerrit.wikimedia.org/r/#/c/286928/ provides a tracking category for pages that use these deprecated tags so they can be fixed.

Got it (although it does say above that <span class="foo"></span> would be stripped). Are there any void tags that can be used instead of <span id="foo"></span> to create anchors?

No, Tidy won't strip tags with attributes. <span id="foo"></span> vs. <span></span>. The former is left behind and latter will be stripped.

That's just partial true actually. Cf. T134423#2266304

No, Tidy won't strip tags with attributes. <span id="foo"></span> vs. <span></span>. The former is left behind and latter will be stripped.

That's just partial true actually. Cf. T134423#2266304

Another reason why Tidy needs to go. :-) .. but, it seems to behave as expected with the id attribute.

<tangent>But, in the long-run, we are planning to get rid of this empty-element-stripping behavior altogether. When we replace Tidy, in order to minimize disruption, we are going to support empty-element stripping for a subset of elements (based on the frequency with which they show up in the wikimedia corpus) and will then phase out that behavior.</tangent>

I have often used <span id="foo" /> to create anchors. What would be the new correct way of doing this?

<span id="foo"></span> should work as well.

Well firstly, it's a bit annoying to have to do that. But secondly, I understand from the discussion above that Tidy will strip empty tags like this, even though they do have a purpose.

No, Tidy won't strip tags with attributes. <span id="foo"></span> vs. <span></span>. The former is left behind and latter will be stripped.

All this is in service of replacing Tidy (T89331) and at that point, we will stop supporting self-closing HTML tags.

We still don't have a firm timeline in place for replacing Tidy, but we are beginning to have tools to support this transition so pages can be fixed before Tidy goes away. For example, https://gerrit.wikimedia.org/r/#/c/286928/ provides a tracking category for pages that use these deprecated tags so they can be fixed.

Got it (although it does say above that <span class="foo"></span> would be stripped). Are there any void tags that can be used instead of <span id="foo"></span> to create anchors?

Nope.

I have a question to clarify:

Does that mean, that basically all <hr /> and <br /> in wikitext, which were up till now being cleaned up from <br>, <br/> and </br> forms by bots or gadgets, will have to be converted to <hr>' and <br>'?

No. HTML5 accepts the self-closing form for tags and treats them as a start tag.

This is fine for <hr> and <br> since that has the intended effect since the spec says they are void elements.

Can someone please give a clear answer to this circumstance. It is fully illogical to propagate this (void) XHTML tags (which where only created to look like the self-closing XHTML tags). So currently bots/scripts convert <hr> and <br> to XHTML !

So the simply question is: are <hr /> and <br /> deprecated or not?

So the simply question is: are <hr /> and <br /> deprecated or not?

Not deprecated.

Perhelion added a comment.EditedJul 15 2016, 3:45 PM

Not deprecated.

Ok can you say why? Can you retrace that this is something illogical or incomprehensible (
especially when we think about the future)? (Anyway the <br /> is mentioned in the leading description)

So people which convert HTML(5) to XHTML (<br> to <br/>) are anyway right to do this?

ssastry edited the task description. (Show Details)Jul 15 2016, 4:49 PM

The tracking category contains pages not using deprecated self-closing tags. (But containing at least some of <br />, <hr />, <ref />, <references />.) Not sure if to have it reported here or open new task for that.

ssastry added a comment.EditedJul 15 2016, 5:11 PM

Not deprecated.

Ok can you say why? Can you retrace that this is something illogical or incomprehensible (
especially when we think about the future)? (Anyway the <br /> is mentioned in the leading description)

So people which convert HTML(5) to XHTML (<br> to <br/>) are anyway right to do this?

I don't fully understand your question. But, I updated the task description which might hopefully clarify any confusion. Internally, the HTML5 parser converts all self-closing forms to a start tag. So, <br/> is converted to <br> and <b/> is converted to <b>. As the updated description explains, this is a problem for non-void tags. Tidy removes a <b/> tag, but a HTML5 parser will change it to a <b>. So, once Tidy is replaced with a HTML5-compliant cleanup tool (T89331), the rendering of pages that use <b/> might change where you'll see a lot of text in bold.

So yes, it is okay to use <br />. It is also okay to convert <br> to <br />, but it is not necessary.

The tracking category contains pages not using deprecated self-closing tags. (But containing at least some of <br />, <hr />, <ref />, <references />.) Not sure if to have it reported here or open new task for that.

New bug please. I am going to resolve this bug shortly. But, it could be that the page is using a template that generates a deprecated self-closing tag. But, if you have a sample page, please open a new bug with a link. Thanks.

The tracking category contains pages not using deprecated self-closing tags. (But containing at least some of <br />, <hr />, <ref />, <references />.) Not sure if to have it reported here or open new task for that.

New bug please. I am going to resolve this bug shortly. But, it could be that the page is using a template that generates a deprecated self-closing tag. But, if you have a sample page, please open a new bug with a link. Thanks.

Solved in meantime. The issue is that the tracking category isn't populated completely, so it was (and actually still is) not listing any of the templates using on the given page.

Perhelion added a comment.EditedJul 18 2016, 12:03 PM

There is a new HTML5 check needed for the custom-user-signature-settings. Because on fixing several pages, the most where user-sigs. Is this a new bug-report? T140606

NicoV added a subscriber: NicoV.Jul 19 2016, 11:18 AM

In addition to the categorization in [[Category:Pages using invalid self-closed HTML tags]], would it be possible to also have an error message when previewing in edit mode, like what is done for [[Category:Pages using duplicate arguments in template calls]] ? It would help a lot to find where the problem is, and if it's coming from a template used inside the page, rather than from the text in the page.

@NicoV Besides I don't think it's effectively doable to show whether the issue is on the current page or in any transcluded template (mind the transclusion can be several levels deep too), when you hit Preview you already see the tracking category.

OTOH, if the error notice would be there to emphasize, there is such issue (although without mentioning the source, as I pointed above), it would be helpful if the message listed the improperly selfclosed tags as sometimes is quite hard to find it due to weird wild constructions such as <tag{{#if:{{{content|}}}|>{{{content}}}/tag|/}}>, so one would have narrowed scope of what to look for...

NicoV added a comment.Jul 19 2016, 3:14 PM

@NicoV Besides I don't think it's effectively doable to show whether the issue is on the current page or in any transcluded template (mind the transclusion can be several levels deep too), when you hit Preview you already see the tracking category.

@Danny_B I was talking about [[Category:Pages using duplicate arguments in template calls]] because that's exactly what is already done for that tracking category : it tells you if the issue is on the current page or if it is in any transcluded template (and it includes the levels when there are several levels...). And it tells you which argument is duplicated also. So it is doable... I don't know how it was done, but it was done and it is very helpful.

So doing it here would also be very helpful if it was done in a same way : current page or transcluded template (with levels) and information about the problem itself (tag name for example would be very helpful to narrow the search).

I agree that you do see the tracking category when you hit Preview but for some articles, it's not very helpful to find where the problem is when the issue is not trivial. For example, I'm currently trying to fix the articles for frwiki, and I've encountered a few pages where I was unable to find the cause of the categorization. For example :
https://fr.wikipedia.org/wiki/Insurrection_de_Boko_Haram
https://fr.wikipedia.org/wiki/Hautes_Tatras
https://fr.wikipedia.org/wiki/Discussion_Portail:Aur%C3%A8s

Hautes Tatras fixed by https://fr.wikipedia.org/w/index.php?title=Mod%C3%A8le:Panorama_annot%C3%A9_Hautes_Tatras&diff=prev&oldid=127982911

It's easy - just open all linked templates and look for improper selfclosing tags...

jrbs added a subscriber: jrbs.Jul 25 2016, 10:08 PM
Elitre added a subscriber: Elitre.Aug 2 2016, 9:54 AM
Arbnos removed a subscriber: Arbnos.Aug 3 2016, 10:34 AM
JJMC89 added a subscriber: JJMC89.Sep 22 2016, 10:07 PM
Jonesey95 added a subscriber: Jonesey95.EditedOct 8 2016, 5:37 PM

Is this supposed to check for self-closed instances of every tag listed on lines 376-381 of https://gerrit.wikimedia.org/r/#/c/286928/11/includes/Sanitizer.php ?

If so, I think it may be missing at least one. See this page for an example of a "pre" tag that is self-closed, but to which the error category has not been applied:

https://en.wikipedia.org/w/index.php?title=User:Jonesey95/sandbox2&oldid=743233850

[edited to add:] Can someone please point us to a complete list of tags that can be listed on the category page? Thanks.

Is this supposed to check for self-closed instances of every tag listed on lines 376-381 of https://gerrit.wikimedia.org/r/#/c/286928/11/includes/Sanitizer.php ?

If so, I think it may be missing at least one. See this page for an example of a "pre" tag that is self-closed, but to which the error category has not been applied:

See T134423#2301066.

[edited to add:] Can someone please point us to a complete list of tags that can be listed on the category page? Thanks.

The only valid self-closed HTML tags are: area, base, br, col, embed, hr, img, input, keygen, link, meta, param, source, track, wbr. In addition, <pre> is treated as an extension tag in MediaWiki and is also exempt. So, all other HTML tags besides these should be fixed if they use the invalid self-closed form. To be very clear and to repeat what I said elsewhere, extension tags (like ref, references, gallery, syntaxhighlight, nowiki, etc.) aren't affected.

@tstarling would you please run a new report run of P3012 ?

@tstarling would you please run a new report run of P3012 ?

Wikidivspan
arwiki1316
cawiki56
cebwiki00
commonswiki20
dewiki12
enwiki1122
enwikinews83
enwikisource01
enwiktionary00
eswiki43
fawiki1246
fiwiki31
frwiki10
frwikisource10
frwiktionary10
huwiki07
idwiki533
incubatorwiki1522
itwiki70
jawiki58
kowiki321
metawiki422
mgwiktionary04
nlwiki00
nowiki11
plwiki240
ptwiki58
rowiki15
ruwiki01
ruwiktionary33
shwiki330
srwiki09
svwiki10
trwiki21
ukwiki821
viwiki415
warwiki17
wikidatawiki11
zhwiki510
zhwiktionary12

Glad to say that pages are no longer using invalid syntax for closing HTML tags on MediaWiki.org as of this moment ( https://www.mediawiki.org/wiki/Category:Pages_using_invalid_self-closed_HTML_tags ):

Not sure if this is actually any relevant information (because I didn't see it listed up above) but I figured it could be of some useful info? If it isn't, feel free to ignore this comment :)

ssastry moved this task from Backlog to In Progress on the MediaWiki-Parser board.Jan 4 2017, 7:30 PM
Liuxinyu970226 added a subscriber: liangent.EditedJan 12 2017, 5:34 AM

@tstarling would you please run a new report run of P3012 ?

Wikidivspan
arwiki1316
cawiki56
cebwiki00
commonswiki20
dewiki12
enwiki1122
enwikinews83
enwikisource01
enwiktionary00
eswiki43
fawiki1246
fiwiki31
frwiki10
frwikisource10
frwiktionary10
huwiki07
idwiki533
incubatorwiki1522
itwiki70
jawiki58
kowiki321
metawiki422
mgwiktionary04
nlwiki00
nowiki11
plwiki240
ptwiki58
rowiki15
ruwiki01
ruwiktionary33
shwiki330
srwiki09
svwiki10
trwiki21
ukwiki821
viwiki415
warwiki17
wikidatawiki11
zhwiki510
zhwiktionary12

for zhwiki, per discussion under Tech News: 2016-20 (@liangent ), most of the rest are jQuery('<div/>') (and jQuery('<span/>')?) and don't need "such fixing"...

for zhwiki, per discussion under Tech News: 2016-20 (@liangent ), most of the rest are jQuery('<div/>') (and jQuery('<span/>')?) and don't need "such fixing"...

Note, though, that for creating a single element with jQuery the MW coding conventions prefer jQuery( '<div>' ) without the trailing slash, so if you want to follow them in on-wiki gadgets/user scripts, you could change these occurrences, too.

Fito added a subscriber: Fito.Feb 13 2017, 4:15 AM