Image captions containing "page " anywhere are parsed as page option
Closed, ResolvedPublic


Any image with caption containing the words "page" and "commentary" in that order in all lower case, whether adjacent or not, and regardless of anything else in the caption are not rendered in VE and are not editable in VE.

See for just about every possible combination but as a summary:
page commentary: not rendered
A 2013 page with commentary: not rendered
Page commentary: rendered
commentary page: rendered

VE apparently treats these images as if they do not have captions, and adding one simply appends it to the one already there, e.g. adding the caption "Maryland" gave:
[[File:MDMap-doton-Bowie.PNG|thumb|180px|page 2013 commentary]] → [[File:MDMap-doton-Bowie.PNG|thumb|180px|page 2013 commentary|Maryland]]

Note that although I did my testing with images in a table this has no effect, as can be seen on the reported example from the live wiki: [[Josephus on Jesus#Testimonium Flavianum]]

Version: unspecified
Severity: normal
See Also:

bzimport set Reference to bz53312.
Thryduulf created this task.Via LegacyAug 25 2013, 10:13 AM
Thryduulf added a comment.Via ConduitAug 25 2013, 11:32 AM

Further testing shows that the key words are "page " (including the trailing space) and "comment" (not commentary). I think that any image caption that matches the following regex will exhibit this bug:

/.*page .*comment.*/

Thryduulf added a comment.Via ConduitAug 25 2013, 3:06 PM

Yet more testing confirms that this is confined to image captions.

If you edit or enter an image caption in VE with the words "page comment" in you can see the caption in that edit.

Saving and reentering VE exhibits odd behaviour - the caption of the image entered in VE is not visible as I predicted. However the edited caption of an image that was already present remains visible. This is confirmed on a subsequent round trip in and out of VE, but I can't test on other systems / browsers than Firefox 23/Linux


The image of Gloucester Docks at the head of the section was the one with the edited caption. The image at the end of the preceding section (the prospect of Derby) was added in VE.

Thryduulf added a comment.Via ConduitAug 25 2013, 11:34 PM

Cryptic C62 at en.wp reports that it's just "page " or "page=" that is required:
"Further further testing shows that the error is caused by "page " (including the space) followed by some text, or "page=" followed by any text or nothing."

GWicke added a comment.Via ConduitAug 26 2013, 5:42 PM

The PHP parser recognizes the page option (see only for PDF files that actually exist:

It does however accept the option 'page=2013 commentary'.

So to me it seems that we need to

  1. only match 'page=' at the start of the potential option
  2. only do so for PDF files that exist
  3. but continue to accept mixed numerical / text page values.
cscott added a comment.Via ConduitNov 8 2013, 10:46 PM

Testing reveals that terminating an image caption with 'alt=', 'thumb=', or 'thumbnail=' also trips up the parser. That seems to be related to the img_attribute production in the peg grammar. Don't know why 'page ' triggers the bug yet, still looking...

cscott added a comment.Via ConduitNov 8 2013, 10:48 PM

Ah, the img_page option contains two aliases, "page=$1" and "page $1". I wonder why we are parsing options twice (one in PEG, once in the magic words localization code)...

gerritbot added a comment.Via ConduitNov 8 2013, 11:00 PM

Change 94446 had a related patch set uploaded by Cscott:
Fix parsing of image captions containing embedded image options.

gerritbot added a comment.Via ConduitNov 12 2013, 4:48 PM

Change 94446 merged by jenkins-bot:
Fix parsing of image captions containing embedded image options.

Jdforrester-WMF added a comment.Via ConduitFeb 27 2014, 8:00 PM
  • Bug 54642 has been marked as a duplicate of this bug. ***

Add Comment