Page MenuHomePhabricator

Image caption is interpreted as a parameter (e.g. when it ends with "px")
Closed, ResolvedPublic

Description

Try writing “[[Image:Example.png|thumb|1000px|Some caption ending with px]]”.

You will receive a default-sized thumbnail without a caption, because the title ending with “px” is interpreted as a width parameter, even though it is invalid (the text in place of the expected number is interpreted as 0, overriding the previous valid 1000px parameter, and rendering at default width), and even though the width has already been specified.

The same is true for captions starting with “upright”, or (for djvu files only) with “page”, because those are the magic words containing “$1” placeholders. Note that while this might not be a big problem in English (who ends a title with “px” or starts a title with a lowercase letter?), it can be a bigger issue for some other languages with different magic words (it happened to me on w:cs: today when trying to write a normal image caption).

The problem is that MagicWord::matchVariableStartToEnd eats anything (.*? – see MagicWord::initRegex), no matter only a number is expected, and the parser does no checks on the extracted value, either. The simplest solution could be to test $value after the “isset( $paramMap[$magicName] )” condition against some basic constraints (at least a simple “&& !preg_match( '/[^0-9.]/', $value )” might do; what parameters are possible there, anyway?).


Version: unspecified
Severity: minor
URL: http://test.wikipedia.org/wiki/Interpreted_image_caption

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:06 PM
bzimport set Reference to bz13436.

soxred93 wrote:

Simple patch

Patch, simple addition

attachment imagediff.txt ignored as obsolete

(In reply to comment #1)

Patch, simple addition

Well, that is _almost_ exactly what I wrote, isn’t it? And your patch omits the decimal dot I wrote, which is used with upright (upright=1.2). (Anyway, I am not completely sure this is the best solution, I considered this a bit hacky.)

+ adding patch keyword

soxred93 wrote:

Better patch

Addition of a dot to the regex. Now works with more options

Attached:

ayg wrote:

This will break dubious but currently-tolerated constructs like "1000 px" (unless I missed a trim() somewhere, which is probable). More pedantically, it will also break constructs like "upright=1.35E2" and "+750px". That's probably okay: this is going to have to be a heuristic and therefore messy, but it's worth pointing that out. Overall, though, it might be simpler to just use is_numeric instead of rolling your own regex. Did you test the patch?

soxred93 wrote:

Yes. Fully tested.

ayg wrote:

Committed a version using is_numeric in r32391.

The patch was incorrect, see my fix in r32394.

Change #1064409 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Add `double-px-category` tracking category for deprecated image size syntax

https://gerrit.wikimedia.org/r/1064409

Change #1064409 merged by jenkins-bot:

[mediawiki/core@master] Add `double-px-category` tracking category for deprecated image size syntax

https://gerrit.wikimedia.org/r/1064409