Optionally allow non-HTML4-compatible ids

This adds a config option, $wgEnforceHtmlIds, true by default. If this
is set to false, all characters that are allowed in XML ids are let
through in header ids and manually-specified ids. In particular, this
should include all alphabetic and numeric characters.

Some remaining issues to work out:

  • This will cause backward-compatibility issues for some types of links

and references: links from non-MediaWiki sources, links from MediaWiki
sources running a different version, external links, and references from
stylesheets/scripts. These could be partially alleviated by having a
second <a name="" id=""> for headers where the two versions differ, but
it would remain an issue for manually-specified id's.

  • Any invalid characters are now, effectively, stripped (replaced with

underscores). This might cause problems if some writing systems are
invalid in id's for some reason: we'll want to double-check the list of
prohibited characters carefully.

  • Some user agents might not support these links. IE5 appears to, and

so do recent versions of Opera and Firefox, but I didn't do extensive

  • Not tested extensively, there are probably some bugs.

I think this would be good to enable on testwiki for the moment to see
how it goes.

No parser test regressions. No change to RELEASE-NOTES, we can add that
when the option is enabled by default (ideally, removed entirely).


simetricalDec 30 2008, 12:22 AM
rSVN45170: Improve ugly interface for Sanitizer::escapeId()