Page MenuHomePhabricator

Dead code in MediaWikiTitleCodec::getTitleInvalidRegex() for checking XML/HTML character references
Closed, ResolvedPublic

Description

Two parts of the invalid title regex are effectively unreachable:

				'|&#[0-9]+;' .
				'|&#x[0-9A-Fa-f]+;' .

This is because we do a fragment split (#) before illegal title characters are checked. So any character reference with a # in it will be split across the dbkey and fragment and fail to match the illegal character reference.

I believe this has been dead code ever since it was introduced in a3a2744d033c41a0456d495f6a0fb5e8165224bf.

Co-discovered with @Erutuon as part of our mwtitle-in-Rust project.

Event Timeline

Change 746652 had a related patch set uploaded (by Legoktm; author: Legoktm):

[mediawiki/core@master] Remove unreachable parts of getTitleInvalidRegex() in PHP and JS

https://gerrit.wikimedia.org/r/746652

Change 746652 merged by jenkins-bot:

[mediawiki/core@master] Remove unreachable parts of getTitleInvalidRegex() in PHP and JS

https://gerrit.wikimedia.org/r/746652

Krinkle assigned this task to Legoktm.