Image captions containing "page " anywhere are parsed as page option
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Thryduulf
	Aug 25 2013, 10:13 AM

Description

Any image with caption containing the words "page" and "commentary" in that order in all lower case, whether adjacent or not, and regardless of anything else in the caption are not rendered in VE and are not editable in VE.

See https://en.wikipedia.org/w/index.php?title=User:Thryduulf/sandbox&oldid=570107739#Third_section for just about every possible combination but as a summary:
page commentary: not rendered
A 2013 page with commentary: not rendered
Page commentary: rendered
commentary page: rendered

VE apparently treats these images as if they do not have captions, and adding one simply appends it to the one already there, e.g. adding the caption "Maryland" gave:
[[File:MDMap-doton-Bowie.PNG|thumb|180px|page 2013 commentary]] → [[File:MDMap-doton-Bowie.PNG|thumb|180px|page 2013 commentary|Maryland]]
https://en.wikipedia.org/w/index.php?title=User%3AThryduulf%2Fsandbox&diff=570108303&oldid=570107739

Note that although I did my testing with images in a table this has no effect, as can be seen on the reported example from the live wiki: [[Josephus on Jesus#Testimonium Flavianum]]

Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=54642

Details

Reference: bz53312

Event Timeline

• bzimport raised the priority of this task from to High.Nov 22 2014, 1:47 AM

• bzimport added a project: Parsoid-Token-Stream-Transforms.

• bzimport set Reference to bz53312.

Thryduulf created this task.Aug 25 2013, 10:13 AM

Further testing shows that the key words are "page " (including the trailing space) and "comment" (not commentary). I think that any image caption that matches the following regex will exhibit this bug:

/.*page .*comment.*/

https://en.wikipedia.org/w/index.php?title=User:Thryduulf/sandbox&oldid=570113406#Third_section

Yet more testing confirms that this is confined to image captions.

If you edit or enter an image caption in VE with the words "page comment" in you can see the caption in that edit.

Saving and reentering VE exhibits odd behaviour - the caption of the image entered in VE is not visible as I predicted. However the edited caption of an image that was already present remains visible. This is confirmed on a subsequent round trip in and out of VE, but I can't test on other systems / browsers than Firefox 23/Linux

See https://en.wikipedia.org/w/index.php?title=User:Thryduulf/sandbox&oldid=570134743#Section_with_a_picture_of_Gloucester

The image of Gloucester Docks at the head of the section was the one with the edited caption. The image at the end of the preceding section (the prospect of Derby) was added in VE.

Cryptic C62 at en.wp reports that it's just "page " or "page=" that is required:
"Further further testing shows that the error is caused by "page " (including the space) followed by some text, or "page=" followed by any text or nothing."

The PHP parser recognizes the page option (see https://www.mediawiki.org/wiki/Help:Images#Syntax) only for PDF files that actually exist:

https://www.mediawiki.org/wiki/User:GWicke/TestPageOption

It does however accept the option 'page=2013 commentary'.

So to me it seems that we need to

only match 'page=' at the start of the potential option
only do so for PDF files that exist
but continue to accept mixed numerical / text page values.

Testing reveals that terminating an image caption with 'alt=', 'thumb=', or 'thumbnail=' also trips up the parser. That seems to be related to the img_attribute production in the peg grammar. Don't know why 'page ' triggers the bug yet, still looking...

Ah, the img_page option contains two aliases, "page=$1" and "page $1". I wonder why we are parsing options twice (one in PEG, once in the magic words localization code)...

Change 94446 had a related patch set uploaded by Cscott:
Fix parsing of image captions containing embedded image options.

https://gerrit.wikimedia.org/r/94446

Change 94446 merged by jenkins-bot:
Fix parsing of image captions containing embedded image options.

https://gerrit.wikimedia.org/r/94446

Bug 54642 has been marked as a duplicate of this bug. ***

Image captions containing "page " anywhere are parsed as page optionClosed, ResolvedPublicActions

Description

Details

Event Timeline

Image captions containing "page " anywhere are parsed as page option
Closed, ResolvedPublic
Actions