Page MenuHomePhabricator

Anchor encoding is not one-to-one
Closed, DeclinedPublic

Description

Author: ayg

Description:
Due to the way we do anchor encoding, there's no way to reliably reverse it. ".3F", for instance, is translated to ".3F", but so is "?". And both "_" and " " become "_". It would be nice if anchor encoding were made reversible to avoid unintended conflicts and permit anchor decoding facilities.

Currently we do, roughly

$id = str_replace( ' ', '_', $id );
$id = Sanitizer::decodeCharReferences( $id );
$id = urlencode( $id );
$id = str_replace( '%3A', ':', $id );
$id = str_replace( '%', '.', $id );

This should be

$id = Sanitizer::decodeCharReferences( $id );
$id = urlencode( $id );
$id = str_replace( '_', '%5F', $id );
$id = str_replace( '.', '%2E', $id );
$id = str_replace( '%20', '_', $id );
$id = str_replace( '%3A', ':', $id );
$id = str_replace( '%', '.', $id );

That could then be reversed reliably (to within entity encoding) with

$id = str_replace( '.', '%', $id );
$id = str_replace( '_', ' ', $id );
$id = urldecode( $id );


Version: unspecified
Severity: enhancement
URL: http://www.mediawiki.org/wiki/User:Simetrical/13016

Details

Reference
bz13016

Related Objects

StatusAssignedTask
ResolvedNone
DeclinedNone

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 10:06 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz13016.
bzimport added a subscriber: Unknown Object (MLST).
bzimport created this task.Feb 14 2008, 2:12 AM

ayg wrote:

The new encoding scheme we use is deliberately not one-to-one, so that the anchors look nicer: invalid characters (mostly punctuation) are converted into underscores for prettiness. WONTFIX.