Page MenuHomePhabricator

Parsoid doesn't recognize valid image options on eowikivoyage and lints them as bogus options
Closed, ResolvedPublic

Description

$ echo '[[File:Foo.jpg|maldekstra|caption]]' | php bin/parse.php --domain eo.wikivoyage.org --linting                                                                                                       1 ↵
<p data-parsoid='{"dsr":[0,35,0,0]}'><span class="mw-default-size" typeof="mw:File" data-parsoid='{"optList":[{"ck":"bogus","ak":"maldekstra"},{"ck":"caption","ak":"caption"}],"dsr":[0,35,null,null]}' data-mw='{"caption":"caption"}'><a href="./Dosiero:Foo.jpg" class="mw-file-description" title="caption" data-parsoid="{}"><img alt="caption" resource="./Dosiero:Foo.jpg" src="//upload.wikimedia.org/wikipedia/commons/0/06/Foo.jpg" decoding="async" data-file-width="300" data-file-height="197" data-file-type="bitmap" height="197" width="300" class="mw-file-element" data-parsoid='{"a":{"resource":"./Dosiero:Foo.jpg","height":"197","width":"300"},"sa":{"resource":"File:Foo.jpg"}}'/></a></span></p>

Notice how maldekstra is classified as bogus. This also shows up as lint errors on eowikivoyage (not sharing link since the contents of the link will change once this bug is resolved) where maldesktra and desktra are tagged as bogus image options.

This showed up while looking eowikivoyage visualdiffing errors where a number of pages had image rendering visual diffs.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change #1064069 had a related patch set uploaded (by Subramanya Sastry; author: C. Scott Ananian):

[mediawiki/services/parsoid@master] Prefer literal string image option to prefixed image option

https://gerrit.wikimedia.org/r/1064069

Some more information:

> $services = \MediaWiki\MediaWikiServices::getInstance();
> $l = $services->getLanguageFactory()->getLanguage('eo');
> $mwf = new \MediaWiki\Parser\MagicWordFactory($l, $services->getHookContainer());
> $x = $mwf->newArray(['img_left','img_width']);
> $x->getBaseRegex();
= [
    "(?i:(?!))",
    "(?P<a_img_left>maldekstra)|(?P<b_img_left>maldekstre)|(?P<c_img_left>left)|(?P<a_img_width>\$1ra)|(?P<b_img_width>\$1px)",
  ]
> $x->matchVariableStartToEnd('maldekstra');
= [
    "img_left",
    false,
  ]
> $x = $mwf->newArray(['img_width','img_left']);
> $x->getBaseRegex();
= [
    "(?i:(?!))",
    "(?P<a_img_width>\$1ra)|(?P<b_img_width>\$1px)|(?P<a_img_left>maldekstra)|(?P<b_img_left>maldekstre)|(?P<c_img_left>left)",
  ]
> $x->matchVariableStartToEnd('maldekstra');
= [
    "img_width",
    "maldekst",
  ]

Note that the precedence of options is pretty much arbitrary, it just *happens* that img_width is added after other options. Parsoid is a bit more principled here, always preferring literal parameters to parameterized (prefix) parameters.

Although the possibility for conflicts exists for all parameterized image parameters, the img_width parameter is the most likely one to result in conflicts in actual practice. Most other parameterized arguments include = in the pattern, which is not likely to occur in a parameter literal, although eg (from MessagesEo.php):

	'img_page'                  => [ '1', 'paĝo=$1', 'paĝo $1', 'pagxo=$1', 'pagxo_$1', 'page=$1', 'page $1' ],

it's not impossible that someone could define a different image parameter as something like page left which would conflict with img_page in the same way that img_width conflicted with img_left for Esperanto.

Anyway, I have a patch to reorder the parameter names in the legacy parser to try to minimize the chance of these issues, and I think Parsoid's policy of always preferring literal parameter names to parameterized names is a reasonable and principled policy for the future.

Change #1064408 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Clarify precedence between conflicting localized image parameter names

https://gerrit.wikimedia.org/r/1064408

Change #1064409 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Add `double-px-category` tracking category for deprecated image size syntax

https://gerrit.wikimedia.org/r/1064409

Change #1064069 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Prefer literal string image option to prefixed image option

https://gerrit.wikimedia.org/r/1064069

Change #1064408 merged by jenkins-bot:

[mediawiki/core@master] Clarify precedence between conflicting localized image parameter names

https://gerrit.wikimedia.org/r/1064408

Change #1066683 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a18

https://gerrit.wikimedia.org/r/1066683

Change #1066683 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a18

https://gerrit.wikimedia.org/r/1066683

Change #1064409 merged by jenkins-bot:

[mediawiki/core@master] Add `double-px-category` tracking category for deprecated image size syntax

https://gerrit.wikimedia.org/r/1064409

ssastry moved this task from To Deploy to To Verify on the Content-Transform-Team-WIP board.