Page MenuHomePhabricator

Fix usage of tidy to work cleanly with html5
Closed, DeclinedPublic

Description

Author: dasch

Description:
When a headline begins with a number this is put as id to the headline element as it is. Because XHTML does not allow the id of an elemente to start with a number this leads to a XHTML error/warning. I think this behaviour should be changed. For example by generaly prepending something to the id or onlay to headings with numbers.

BTW: What happens if two headings have the same name? This would also lead to a XHTML Error.


Version: 1.20.x
Severity: normal

Details

Reference
bz31871

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:49 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz31871.
bzimport added a subscriber: Unknown Object (MLST).

Ids with a leading number are valid in html5. Archaic XHTML rules don't really matter here as MediaWiki now uses html5.

For the record MediaWiki never actually used XHTML. It used a XHTML doctype, but XHTML served with text/html is not XHTML and is not parsed as XHTML, in such a situation what a validator says is irrelevant.

When two headings have the same name the later id="" has an underscore and incrementing numeral added to it.

dasch wrote:

Well, when mediawiki uses HTML5 then Tidy should know this and I should not have this in my page

line 77 column 38 - Warning: <span> attribute "id" has invalid value "12._Oktober_2011"
line 80 column 38 - Warning: <span> attribute "id" has invalid value "11._Oktober_2011"
line 83 column 39 - Warning: <span> attribute "id" has invalid value "10._Oktober_2011"
line 86 column 39 - Warning: <span> attribute "id" has invalid value "08._Oktober_2011"
line 89 column 39 - Warning: <span> attribute "id" has invalid value "06._Oktober_2011"
line 92 column 39 - Warning: <span> attribute "id" has invalid value "03._Oktober_2011"

Guess that's related to the output-xhtml=yes setting in tidy.conf. I expect that was done so that tidy would output valid xml because we still haven't stopped outputting the well-formed xml format of markup.

We'll need to see if changing the tidy settings causes any unwanted bugs or side effects.

The tidy settings that html5-rack-tidy uses look interesting:
https://github.com/customink/html5-rack-tidy/blob/master/lib/rack/tidy/cleaner.rb

We may have a use for some of those settings. Maybe if necessary we could use separate tidy.conf and tidy5.conf files.

TheDJ added a comment.Aug 1 2012, 11:05 AM

W3C recently seems to have taken over Tidy and is working on HTML5 compatible version of it.

https://github.com/w3c/tidy-html5/

  • Bug 39525 has been marked as a duplicate of this bug. ***
ssastry moved this task from Backlog to In Progress on the MediaWiki-Parser board.Dec 17 2015, 5:57 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 17 2015, 5:57 PM
Danny_B removed a subscriber: wikibugs-l-list.
ssastry closed this task as Declined.Oct 23 2017, 8:34 PM
ssastry added a subscriber: ssastry.

This is now covered by T89331: Replace HTML4 Tidy in MW parser with an equivalent HTML5 based tool and is in progress. Expected to be resolved by July 2018.