Page MenuHomePhabricator

Sanitizer::escapeId generates ambiguous IDs
Open, LowPublic

Description

In traditional mode ($wgExperimentalHtmlIds = false) Sanitizer::escapeId generates ambiguous values:
Sanitizer::escapeId( '!', [ 'noninitial' ] ) and Sanitizer::escapeId( '.21', [ 'noninitial' ] ) evaluates to .21.

A heuristic decoding would generate relevant errors. For example the heading

172.31.255.255

gets encoded to id="172.31.255.255" and a heuristic decoder would generate 1721%5%5.

It would be possible to generate nonambiguous values by encoding . with .2E. Sanitizer::escapeId( '.21', [ 'noninitial' ] ) would evaluate to .2E21.

Such a change would be easy to implement by adding

'.' => '%2E',

after line https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/Sanitizer.php;d0a0838cb76b4cf20977c4aba5fe06877d8deb58$1205

But such a change is not backward compatible. Existing anchor links to headings can break.

Event Timeline

Fomafix raised the priority of this task from to Low.
Fomafix updated the task description. (Show Details)
Fomafix added a project: MediaWiki-General.
Fomafix added a subscriber: Fomafix.

It is ambiguous from the perspective of trying to reverse an ID back into the title heading yes. However that is not a supported use case for the ID. The only thing that matters is that the logic is deterministic in one direction, and that the produced ID (in case of OutputPage) is unique.

In what case do you need to reverse it? That sounds like something that is in need of a better approach.

The benefit of the current function is, that it is an idempotent function:

Sanitizer::escapeId( Sanitizer::escapeId( x ) ) === Sanitizer::escapeId( x )