Page MenuHomePhabricator

Parsoid and PHP parser parse <gallery caption="…"> differently
Closed, ResolvedPublic

Description

Parsoid and PHP parser parse <gallery caption="…"> differently:

  • PHP parser only allows internal links (file and category syntax also works, this is probably accidental). No other syntax is allowed, in particular you can't even use italics/bold or templates.
  • Parsoid allows all wikitext syntax (although you can't really use block syntax because newlines are turned into spaces).

Note that this is only for the gallery caption, not for captions of individual images in the gallery, which allow all wikitext syntax in both parsers.

Example:

<gallery caption="# List item

Text '''bold''' [[link]] {{ns:-1}}

[[File:Example.jpg|thumb|File in gallery caption]]">
File:Example.jpg|Image caption
</gallery>
PHP parser renderingParsoid rendering

I have no strong opinion on which is the preferred behavior, but it should be the same.

However, this seems to be the only place where we apply such a limitation. Other similar constructs (e.g. functionally similar table caption |+, or syntactically similar <mapframe text="…">) allow all wikitext syntax. It would probably be easier to allow all syntax here.

Historical context:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 21 2018, 11:23 PM

From https://github.com/wikimedia/parsoid/blob/master/lib/ext/Gallery/index.js#L68-L71

// FIXME: This is too permissive.  The php implementation only calls
// `replaceInternalLinks` on the gallery caption.  We should have a new
// tokenizing rule that only tokenizes text / wikilink.
Arlolra triaged this task as Normal priority.Feb 22 2018, 11:46 PM
Arlolra claimed this task.
ssastry moved this task from Backlog to Read Views on the Parsoid board.Feb 26 2018, 4:24 PM

Change 417029 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/core@master] Parse wikitext in gallery caption

https://gerrit.wikimedia.org/r/417029

Change 417032 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Match php parser gallery caption parsing

https://gerrit.wikimedia.org/r/417032

Esanders added a subscriber: Esanders.EditedMar 12 2018, 12:34 PM

@Jdforrester-WMF Mar 8 7:52 PM Patch Set 1:
Hmm. I'd mildly prefer we limit captions more rather than less (especially, I'm worried about how much breakage this will encourage people to come up with).

What if the caption is "Stills from the set of Movie Title", where the style guide calls for italics? Allowing links but not other text style seems odd. Could we re-use the ruleset from commit messages?

Could we re-use the ruleset from commit messages?

Edit summaries / log comments (if that's what you mean?) also only allow internal links (and apparently HTML entities, for some reason). It's not a reusable ruleset for the normal parser, they have its own mini-parser (a soup of regexes even worse than the normal one) in Linker::formatComment().

As I understand, @Arlolra's patch allows all wikitext except paragraph wrapping and start-of-line syntax (tables and lists). I think that's reasonable.

cscott added a subscriber: cscott.Mar 27 2018, 3:33 PM

<gallery caption="Foo&#10;&#10;bar"> ... </gallery> might be an interesting test case.

<gallery caption="Foo&#10;&#10;bar"> ... </gallery> might be an interesting test case.

I approve of T192037: Writeup some sort of position statement against subsets of wikitext (wikitext subsets are evil) and believe the gallery caption attribute should be full wikitext. The answer to the question I posed above is T204283: Serializing extension tags using TemplateData.

Reedy edited projects, added Parsoid-Read-Views; removed Parsoid.Sep 17 2018, 7:25 PM

Change 417029 merged by jenkins-bot:
[mediawiki/core@master] Parse wikitext in gallery caption

https://gerrit.wikimedia.org/r/417029

Change 417032 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Match php parser gallery caption parsing

https://gerrit.wikimedia.org/r/417032

Arlolra closed this task as Resolved.Jan 23 2019, 6:51 PM

Mentioned in SAL (#wikimedia-operations) [2019-01-24T19:25:13Z] <arlolra> Updated Parsoid to f1d717f (T187958, T205337, T214103)

Change 487550 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/extensions/VisualEditor@master] ve.ui.MWGalleryDialog: Allow normal tools in gallery captions

https://gerrit.wikimedia.org/r/487550

Change 487550 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] ve.ui.MWGalleryDialog: Allow normal tools in gallery captions

https://gerrit.wikimedia.org/r/487550