Image caption is interpreted as a parameter (e.g. when it ends with "px")
Closed, ResolvedPublic

Description

Try writing “[[Image:Example.png|thumb|1000px|Some caption ending with px]]”.

You will receive a default-sized thumbnail without a caption, because the title ending with “px” is interpreted as a width parameter, even though it is invalid (the text in place of the expected number is interpreted as 0, overriding the previous valid 1000px parameter, and rendering at default width), and even though the width has already been specified.

The same is true for captions starting with “upright”, or (for djvu files only) with “page”, because those are the magic words containing “$1” placeholders. Note that while this might not be a big problem in English (who ends a title with “px” or starts a title with a lowercase letter?), it can be a bigger issue for some other languages with different magic words (it happened to me on w:cs: today when trying to write a normal image caption).

The problem is that MagicWord::matchVariableStartToEnd eats anything (.*? – see MagicWord::initRegex), no matter only a number is expected, and the parser does no checks on the extracted value, either. The simplest solution could be to test $value after the “isset( $paramMap[$magicName] )” condition against some basic constraints (at least a simple “&& !preg_match( '/[^0-9.]/', $value )” might do; what parameters are possible there, anyway?).


Version: unspecified
Severity: minor
URL: http://test.wikipedia.org/wiki/Interpreted_image_caption

bzimport set Reference to bz13436.
Mormegil created this task.Via LegacyMar 19 2008, 6:47 PM
bzimport added a comment.Via ConduitMar 19 2008, 9:40 PM

soxred93 wrote:

Simple patch

Patch, simple addition

attachment imagediff.txt ignored as obsolete

Mormegil added a comment.Via ConduitMar 19 2008, 10:28 PM

(In reply to comment #1)

Patch, simple addition

Well, that is _almost_ exactly what I wrote, isn’t it? And your patch omits the decimal dot I wrote, which is used with upright (upright=1.2). (Anyway, I am not completely sure this is the best solution, I considered this a bit hacky.)

+ adding patch keyword

bzimport added a comment.Via ConduitMar 20 2008, 12:35 AM

soxred93 wrote:

Better patch

Addition of a dot to the regex. Now works with more options

Attached: imagediff.txt

bzimport added a comment.Via ConduitMar 25 2008, 12:36 AM

ayg wrote:

This will break dubious but currently-tolerated constructs like "1000 px" (unless I missed a trim() somewhere, which is probable). More pedantically, it will also break constructs like "upright=1.35E2" and "+750px". That's probably okay: this is going to have to be a heuristic and therefore messy, but it's worth pointing that out. Overall, though, it might be simpler to just use is_numeric instead of rolling your own regex. Did you test the patch?

bzimport added a comment.Via ConduitMar 25 2008, 12:39 AM

soxred93 wrote:

Yes. Fully tested.

bzimport added a comment.Via ConduitMar 25 2008, 12:46 AM

ayg wrote:

Committed a version using is_numeric in r32391.

tstarling added a comment.Via ConduitMar 25 2008, 5:21 AM

The patch was incorrect, see my fix in r32394.

Gilles added a project: Multimedia.Via WebDec 4 2014, 10:49 AM
Gilles moved this task to Closed on the Multimedia workboard.
Ricordisamoa added a subscriber: Ricordisamoa.Via WebJan 7 2015, 11:40 AM

Add Comment