Page MenuHomePhabricator

Parser.php doMagicLinks() mishandle abbr tag
Closed, ResolvedPublic


Author: mediawiki

With the white listing of the <abbr>, the function doMagicLinks() of Parser.php mix <a> and <abbr> together.

Version: unspecified
Severity: normal



Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:58 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz22905.
bzimport added a subscriber: Unknown Object (MLST).

mediawiki wrote:

regular expression modification


ayg wrote:

  1. Do you have a test case that demonstrates the problem? I.e., what's some markup that parses incorrectly because of this bug?
  1. Your change doesn't seem quite right -- whitespace other than a simple space would be valid HTML here (although I haven't looked closely enough to see if it would actually be possible at this stage in the parsing). I would suggest (<a[^a-z0-9].*?</a>).

mediawiki wrote:

  1. The wiki markup bellow get incorrectly parsed. You can also check [[User:GuillaumeBeaudoin]] for more example.

<abbr>(fr)</abbr> ISBN 2753300917 [ La méthode Google]

The <abbr> tag is extensively used on the French wikipedia and the issue have been first found on [[fr:Wikipedia]] by [[fr:User:Manu1400]].

  1. You're right, a tab or any whitespace other than a simple space would not make good on my regular expression. We could use \s for any whitespaces (option A). The one likes what you've proposed (option B).

Option A - <a[\w>].*?</a>
Option B - <a[^a-zA-Z0-9].*?</a>
Option C - <a[^[:alnum:]].*?</a>

Altough, I'm not sure what capital letters would do.

ayg wrote:

Committed a modified version in r64113. I went with (<a[ \t\r\n>].*?</a>) in the end, matching the HTML5 spec as far as I'm reading it: Thanks for the patch!

mediawiki wrote:

Thanks you Aryeh. Merci!

smccandlish wrote:

Since this is fixed, removing Bug #617 as a "blocks" dependency.

smccandlish wrote:

Woops, typo. Corrected: Since this is fixed, removing Bug #671 as a "blocks" dependency.