Page MenuHomePhabricator

JS mw.Title does not strip Unicode bidi control characters from input, unlike PHP MediaWikiTitleCodec
Closed, ResolvedPublic

Description

JS mw.Title does not strip Unicode bidi control characters from input, unlike PHP MediaWikiTitleCodec. I'm also not convinced that it handles whitespace characters correctly.

MediaWikiTitleCodec::splitTitleString:

		# Strip Unicode bidi override characters.
		# Sometimes they slip into cut-n-pasted page titles, where the
		# override chars get included in list displays.
		$dbkey = preg_replace( '/\xE2\x80[\x8E\x8F\xAA-\xAE]/S', '', $dbkey );

		# Clean up whitespace
		# Note: use of the /u option on preg_replace here will cause
		# input with invalid UTF-8 sequences to be nullified out in PHP 5.2.x,
		# conveniently disabling them.
		$dbkey = preg_replace(
			'/[ _\xA0\x{1680}\x{180E}\x{2000}-\x{200A}\x{2028}\x{2029}\x{202F}\x{205F}\x{3000}]+/u',
			'_',
			$dbkey
		);
		$dbkey = trim( $dbkey, '_' );

mediawiki.Title#parse:

		title = title
			// Normalise whitespace to underscores and remove duplicates
			.replace( /[ _\s]+/g, '_' )
			// Trim underscores
			.replace( rUnderscoreTrim, '' );

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 23 2016, 11:53 PM

Change 306493 had a related patch set uploaded (by Bartosz Dziewoński):
mw.Title: Correct handling of Unicode whitespace and bidi control characters

https://gerrit.wikimedia.org/r/306493

matmarex claimed this task.Aug 24 2016, 8:51 PM
matmarex triaged this task as Normal priority.

Change 306493 merged by jenkins-bot:
mw.Title: Correct handling of Unicode whitespace and bidi control characters

https://gerrit.wikimedia.org/r/306493

matmarex closed this task as Resolved.Aug 31 2016, 12:12 AM
matmarex removed a project: Patch-For-Review.