Page MenuHomePhabricator

Closing square brackets (including ext. link syntax) break image parsing
Closed, ResolvedPublic

Description

Author: puglisi

Description:
An image with an external link (in the form [url desc]) inside the description
will not be recognized as valid image syntax by the parser. See:

http://commons.wikimedia.org/w/index.php?title=Grenada&oldid=24868

It was edited multiple times in the last weeks, so I suppose that it was working
before 1.4 beta.


Version: 1.4.x
Severity: normal

Details

Reference
bz1033

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:05 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz1033.
bzimport added a subscriber: Unknown Object (MLST).

gentgeen wrote:

The bug also seems to make other images in the same article not render properly.
In the old version of the article below, the image with the external link is in
the "History" section of the article, while an image in the "Arts and
architecture" section also fails to render properly despite not having an
external link.

http://en.wikipedia.org/w/index.php?title=San_Jose%2C_California&oldid=8737066

John Pozniak [[w:en:User:Gentgeen]]

D.U.Thibault wrote:

Images were working before 1.4 beta. ALL of the "Flag of" pages are now
horrendously broken. See for example [[Flag of Afghanistan]]; the firts flag
image is supposed to be
<nowiki>[[Image:Afghanistan_flag_large.png|thumb|250px|[[Image:FIAV_56.png]]
Flag Ratio: 1:2]]</nowiki>, that is to say a 250px wide thumb of the Afghanistan
flag, with a legend prefixed with the small image FIAV_56. That result is there,
but is prepended and appended with extraneous bits of the legend and code.

rowan.collins wrote:

I've moved the "image in an image caption" issue (comment 2) to a seperate bug
(bug 1217), because it has a sufficiently different effect that I suspect it of
being a different part of the code (i.e. it renders, but badly, whereas ext.
links don't render at all).

I've just realised that this bug is actually triggered by any "]" in the image
caption (see simplified test case at http://test.wikipedia.org/wiki/Bug_1033),
which points back to Parser::replaceInternalLinks(), and either the odd code for
parsing links within captions, or the original regex for what a link might look
like:
$e1 = "/^([{$tc}]+)(?:\\|([^]]+))?]](.*)\$/sD";
In the check for links-in-captions, I made an unsupported assumption that
anything matching $e1_img and not $e1 represents a line that ends without its
"]]". This isn't true, because "[[Foo|...]...]]" also fails $e1 (I think), but
there's no inner link to close. So if you invent and then close an inner link,
you can fool that code: "[[Image:AllisonCrowe.jpg|frame|foo ] bar]] [[]] ]]"
works as you'd expect, and so does "[[Image:AllisonCrowe.jpg|frame|foo
[http://example.com example] bar]] [[]] ]]". Yeuch!

Comment 1: > The bug also seems to make other images in the same article not
render properly.
I appear to have predicted that side-effect in bug 637 comment 11 - a "broken"
image syntax has knock-on effects for the whole rest of the page. I later
realised that we can avoid "eating" *any* additional text - if we don't need it
for the caption, it can stay in the array for the next iteration, but I never
got round to implementing that; I, or somebody (who has any idea what I'm on
about), should do so.

I'm CCing Wil Mahan on this, because he and I hashed out the current
replaceInternalLinks() function in response to bug 637, so he might have some
thoughts.

rowan.collins wrote:

It seems JeLuF has spotted a duplicate report of this issue (bug 1317) and fixed
it by simply allowing any link to have a "]" after the "|". While this obviously
only makes sense for image captions, it seems a reasonable enough solution, so
I'll mark this as fixed.