Page MenuHomePhabricator

RFC/PMID/ISBN regexes need \b restrictions before and after
Closed, ResolvedPublic

Description

The PHP parser currently recognizes RFC/PMID/ISBN links even if they are buried in unrelated text, for example:

fooRFC 1234bar

We should add \b restrictions to the regexp to ensure that magic links stand apart from other text.

See also bug 28950, which asks for the whitespace restrictions in magic links to be loosened somewhat.


Version: 1.24rc
Severity: normal

Details

Reference
bz65278

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:25 AM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz65278.
bzimport added a subscriber: Unknown Object (MLST).

Source code file name (and path) welcome in case a contributor would like to give fixing the regexes a shot. Marking as easy.

Unfortunately parser changes are never quite so "easy" -- even though the source code change is small, before deployment we need to grep through all our existing wikis to be sure that no one is using RFC links that will be broken. Perhaps some language wiki uses prefixes for possessives and quite likes the current behavior. We won't know until we look.

But sure, patch welcome! Just don't be surprised if it's not immediately committed. (Oh, and be sure to include parser tests with your patch.)

Source code file name is includes/parser/Parser.php -- here's the code in question:

http://git.wikimedia.org/blob/mediawiki%2Fcore.git/2e50b896f1a55667ced32502caa9681c36df7587/includes%2Fparser%2FParser.php#L1387

Gotcha. Thanks for elaborating!

Hm. The bot seems to be asleep. Here's a patch for this: https://gerrit.wikimedia.org/r/133650

cscott claimed this task.

Patch merged, closing.

Change 178711 had a related patch set uploaded (by Cscott):
WIP: Magic link fixes.

https://gerrit.wikimedia.org/r/178711

Patch-For-Review