Page MenuHomePhabricator

Dead code in MediaWikiTitleCodec::getTitleInvalidRegex() for checking XML/HTML character references
Closed, ResolvedPublic

Description

Two parts of the invalid title regex are effectively unreachable:

				'|&#[0-9]+;' .
				'|&#x[0-9A-Fa-f]+;' .

This is because we do a fragment split (#) before illegal title characters are checked. So any character reference with a # in it will be split across the dbkey and fragment and fail to match the illegal character reference.

I believe this has been dead code ever since it was introduced in a3a2744d033c41a0456d495f6a0fb5e8165224bf.

Co-discovered with @Erutuon as part of our mwtitle-in-Rust project.

Event Timeline

Change 746652 had a related patch set uploaded (by Legoktm; author: Legoktm):

[mediawiki/core@master] Remove unreachable parts of getTitleInvalidRegex() in PHP and JS

https://gerrit.wikimedia.org/r/746652

Change 746652 merged by jenkins-bot:

[mediawiki/core@master] Remove unreachable parts of getTitleInvalidRegex() in PHP and JS

https://gerrit.wikimedia.org/r/746652

Krinkle assigned this task to Legoktm.

Change 902416 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/services/parsoid@master] Title.php: remove unreachable parts of title invalid regexp

https://gerrit.wikimedia.org/r/902416

Change 902416 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Title.php: remove unreachable parts of title invalid regexp

https://gerrit.wikimedia.org/r/902416

Change 903292 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.18.0-a3

https://gerrit.wikimedia.org/r/903292

Change 903292 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.18.0-a3

https://gerrit.wikimedia.org/r/903292