Stop adding xml:lang attributes to HTML5 pages
Closed, ResolvedPublic

Description

Author: michael

Description:
Wikipedia and Wiktionary pages now have the HTML5 doctype <!doctype html>, and a root <html> element with only a lang tag. HTML5 doesn’t require the xml:lang attribute. According to the spec,

“The attribute in no namespace with no prefix and with the literal local name "<code>xml:lang</code>" has no effect on language processing.”

But if you enter, e.g., <span lang="fr">fou<span> into a wiki page, the wikitext parser will add a redundant and vestigial xml:lang attribute.

The parser should stop adding the xml:lang attribute in pages that are HTML5 and not XML.


Version: 1.21.x
Severity: trivial

bzimport added a project: MediaWiki-Parser.Via ConduitNov 22 2014, 1:30 AM
bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz44609.
bzimport created this task.Via LegacyFeb 3 2013, 12:46 AM
bzimport added a comment.Via ConduitFeb 3 2013, 12:53 AM

michael wrote:

[I wish I could edit my bug, or at least preview. Have you guys heard of this “wiki” thing. Here’s a better-formatted version of my bug report.]

Wikipedia and Wiktionary pages now have the HTML5 doctype <!doctype html>, and a root <html> tag with only a lang attribute. HTML5 doesn’t require the xml:lang attribute. According to the spec, “The attribute in no namespace with no prefix and with the literal local name "xml:lang" has no effect on language processing.”

Source: http://www.w3.org/TR/2011/WD-html5-20110525/elements.html#the-lang-and-xml:lang-attributes

But if you enter, e.g., <span lang="fr">fou<span> into a wiki page, the Wikitext parser will add a redundant and vestigial xml:lang attribute.

The parser should stop adding the xml:lang attribute in pages that are HTML5 and not XML.

DCDuring added a comment.Via ConduitFeb 3 2013, 1:21 AM

I agree. Let's keep things nice a tidy.

PleaseStand added a comment.Via ConduitFeb 3 2013, 1:53 AM

This is caused by the "output-xhtml" option in includes/tidy.conf. Unfortunately, disabling it seems to break things such as the conversion from <hr> to <hr />, so many pages would no longer be well-formed XML as configured by $wgWellFormedXml.

Note that adding the extra attribute is legal according to HTML5 section 3.2.3.3:

"Authors must not use the lang attribute in the XML namespace on HTML elements in HTML documents. To ease migration to and from XHTML, authors may specify an attribute in no namespace with no prefix and with the literal localname "xml:lang" on HTML elements in HTML documents, but such attributes must only be specified if a lang attribute in no namespace is also specified, and both attributes must have the same value when compared in an ASCII case-insensitive manner."

bzimport added a comment.Via ConduitFeb 3 2013, 8:14 PM

ran.arigur wrote:

(In reply to comment #3)

Unfortunately, [...] many pages would no longer be well-formed XML [...]

Why is that unfortunate? The pages are HTML, not XHTML (we're serving them as text/html, not as e.g. application/xhtml+xml), so there's no reason they *should* be well-formed XML. (See HTML5 section 1.6, or section 8.) The spec says that in the HTML syntax, the use of '/' on void elements (br, hr, img, etc.) is optional and has no effect. (See HTML5 section 8.1.2.1, clause 6.)

(That's as far as the standard is concerned. Obviously we also care about browser support, but personally I find it impossible to believe that any real-world browser would stumble over '<hr>' in an HTML document.)

Aklapper added a comment.Via ConduitFeb 5 2013, 11:03 AM

(In reply to comment #1)

[I wish I could edit my bug, or at least preview. Have you guys heard of this
“wiki” thing. Here’s a better-formatted version of my bug report.]

Offtopic: The "you guys" that you want to talk with can be reached here:
https://bugzilla.mozilla.org/show_bug.cgi?id=40896

TheDJ closed this task as "Resolved".Via WebApr 13 2015, 8:08 AM
TheDJ claimed this task.
TheDJ added a subscriber: TheDJ.

Not sure when this got fixed, but our pages no longer emit xml:lang in html5 mode anymore.

He7d3r edited the task description. (Show Details)Via WebApr 13 2015, 2:20 PM
He7d3r set Security to None.

Add Comment