Stop adding xml:lang attributes to HTML5 pages
OpenPublic

Description

Author: michael

Description:
Wikipedia and Wiktionary pages now have the HTML5 doctype &lt;!doctype html>, and a root &lt;html> element with only a lang tag. HTML5 doesn’t require the xml:lang attribute. According to the spec, “The attribute in no namespace with no prefix and with the literal local name "<code>xml:lang</code>" has no effect on language processing.”[http://www.w3.org/TR/2011/WD-html5-20110525/elements.html#the-lang-and-xml:lang-attributes]

But if you enter, e.g., &lt;span lang="fr">fou&lt;span> into a wiki page, the Wikitext parser will add a redundant and vestigial xml:lang attribute.

The parser should stop adding the xml:lang attribute in pages that are HTML5 and not XML.


Version: 1.21.x
Severity: trivial

bzimport added a project: MediaWiki-Parser.Via ConduitNov 22 2014, 1:30 AM
bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz44609.
bzimport created this task.Via LegacyFeb 3 2013, 12:46 AM
bzimport added a comment.Via ConduitFeb 3 2013, 12:53 AM

michael wrote:

[I wish I could edit my bug, or at least preview. Have you guys heard of this “wiki” thing. Here’s a better-formatted version of my bug report.]

Wikipedia and Wiktionary pages now have the HTML5 doctype <!doctype html>, and a root <html> tag with only a lang attribute. HTML5 doesn’t require the xml:lang attribute. According to the spec, “The attribute in no namespace with no prefix and with the literal local name "xml:lang" has no effect on language processing.”

Source: http://www.w3.org/TR/2011/WD-html5-20110525/elements.html#the-lang-and-xml:lang-attributes

But if you enter, e.g., <span lang="fr">fou<span> into a wiki page, the Wikitext parser will add a redundant and vestigial xml:lang attribute.

The parser should stop adding the xml:lang attribute in pages that are HTML5 and not XML.

bzimport added a comment.Via ConduitFeb 3 2013, 1:21 AM

dcduring wrote:

I agree. Let's keep things nice a tidy.

PleaseStand added a comment.Via ConduitFeb 3 2013, 1:53 AM

This is caused by the "output-xhtml" option in includes/tidy.conf. Unfortunately, disabling it seems to break things such as the conversion from <hr> to <hr />, so many pages would no longer be well-formed XML as configured by $wgWellFormedXml.

Note that adding the extra attribute is legal according to HTML5 section 3.2.3.3:

"Authors must not use the lang attribute in the XML namespace on HTML elements in HTML documents. To ease migration to and from XHTML, authors may specify an attribute in no namespace with no prefix and with the literal localname "xml:lang" on HTML elements in HTML documents, but such attributes must only be specified if a lang attribute in no namespace is also specified, and both attributes must have the same value when compared in an ASCII case-insensitive manner."

bzimport added a comment.Via ConduitFeb 3 2013, 8:14 PM

ran.arigur wrote:

(In reply to comment #3)

Unfortunately, [...] many pages would no longer be well-formed XML [...]

Why is that unfortunate? The pages are HTML, not XHTML (we're serving them as text/html, not as e.g. application/xhtml+xml), so there's no reason they *should* be well-formed XML. (See HTML5 section 1.6, or section 8.) The spec says that in the HTML syntax, the use of '/' on void elements (br, hr, img, etc.) is optional and has no effect. (See HTML5 section 8.1.2.1, clause 6.)

(That's as far as the standard is concerned. Obviously we also care about browser support, but personally I find it impossible to believe that any real-world browser would stumble over '<hr>' in an HTML document.)

Aklapper added a comment.Via ConduitFeb 5 2013, 11:03 AM

(In reply to comment #1)

[I wish I could edit my bug, or at least preview. Have you guys heard of this
“wiki” thing. Here’s a better-formatted version of my bug report.]

Offtopic: The "you guys" that you want to talk with can be reached here:
https://bugzilla.mozilla.org/show_bug.cgi?id=40896

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.