Page MenuHomePhabricator

Fix usage of tidy to work cleanly with html5
Closed, DeclinedPublic


Author: dasch

When a headline begins with a number this is put as id to the headline element as it is. Because XHTML does not allow the id of an elemente to start with a number this leads to a XHTML error/warning. I think this behaviour should be changed. For example by generaly prepending something to the id or onlay to headings with numbers.

BTW: What happens if two headings have the same name? This would also lead to a XHTML Error.

Version: 1.20.x
Severity: normal



Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:49 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz31871.
bzimport added a subscriber: Unknown Object (MLST).

Ids with a leading number are valid in html5. Archaic XHTML rules don't really matter here as MediaWiki now uses html5.

For the record MediaWiki never actually used XHTML. It used a XHTML doctype, but XHTML served with text/html is not XHTML and is not parsed as XHTML, in such a situation what a validator says is irrelevant.

When two headings have the same name the later id="" has an underscore and incrementing numeral added to it.

dasch wrote:

Well, when mediawiki uses HTML5 then Tidy should know this and I should not have this in my page

line 77 column 38 - Warning: <span> attribute "id" has invalid value "12._Oktober_2011"
line 80 column 38 - Warning: <span> attribute "id" has invalid value "11._Oktober_2011"
line 83 column 39 - Warning: <span> attribute "id" has invalid value "10._Oktober_2011"
line 86 column 39 - Warning: <span> attribute "id" has invalid value "08._Oktober_2011"
line 89 column 39 - Warning: <span> attribute "id" has invalid value "06._Oktober_2011"
line 92 column 39 - Warning: <span> attribute "id" has invalid value "03._Oktober_2011"

Guess that's related to the output-xhtml=yes setting in tidy.conf. I expect that was done so that tidy would output valid xml because we still haven't stopped outputting the well-formed xml format of markup.

We'll need to see if changing the tidy settings causes any unwanted bugs or side effects.

The tidy settings that html5-rack-tidy uses look interesting:

We may have a use for some of those settings. Maybe if necessary we could use separate tidy.conf and tidy5.conf files.

W3C recently seems to have taken over Tidy and is working on HTML5 compatible version of it.

  • Bug 39525 has been marked as a duplicate of this bug. ***
ssastry added a subscriber: ssastry.

This is now covered by T89331: Replace HTML4 Tidy in MW parser with an equivalent HTML5 based tool and is in progress. Expected to be resolved by July 2018.