Page MenuHomePhabricator

ParsoidPHP adds "File:" in <gallery> element to every line
Closed, ResolvedPublic

Description

It looks like the new ParsoidPHP adds in the <gallery> element a leading "File:" (or "Datei:" in German Wikipedia) to every line, Example. First time reported on the German Wikipedia on Dec 3rd some hours after the switch.

Event Timeline

The localized version is inserted, e.g. Datei: or Fichier:.

  • Localization is blocking C&P transfer into other language wikis.
  • German Wikipedia is removing the namespace name by JS tools when source code is edited to enable smooth exchange with other wikis.

The namespace name is redundant and superfluous in gallery and has been a less smart concept there.

  • Nothing else than a media ID is permitted at this position, and therefore it is pointless to ask whether it is called Image: or File:.

VE etc. must not insert any namespace name.

  • It is cluttering diff and disabling interwiki exchange. At least confusing, no benefit.

Reproducible:

$ (echo '<gallery>' ; echo 'Foo.jpg|foo' ; echo '</gallery>') | php bin/parse.php --wt2wt
<gallery>
File:Foo.jpg|foo
</gallery>

But this behavior is in Parsoid/JS as well:

$ (echo '<gallery>' ; echo 'Foo.jpg|foo' ; echo '</gallery>') | bin/parse.js --wt2wt
<gallery>
File:Foo.jpg|foo
</gallery>

And this behavior was present in Parsoid/JS at least as far back as Oct 29 / 42f33caad4f954a3cdea09301472e42bc8478ce2.

Possibly what's going on is a selser failure?

ssastry triaged this task as High priority.Dec 4 2019, 2:32 PM
ssastry subscribed.

Thanks for reporting. We'll take a look at this right away.

Hm, selser looks fine:

$ (echo 'foo' ; echo '' ; echo '<gallery>' ; echo 'Foo.jpg|foo' ; echo '</gallery>') > T239783.wt
$ php bin/parse.php --wt2html < T239783.wt > T239783.html
$ sed -e 's/>foo<\/p>/>bar<\/p>/' < T239783.html > T239783.2.html
$ php bin/parse.php --selser --oldtextfile T239783.wt --oldhtmlfile T239783.html < T239783.2.html
bar

<gallery>
Foo.jpg|foo
</gallery>

Maybe VE is doing something to the gallery content that's causing it to be marked as dirty?

Hm, selser looks fine:

$ (echo 'foo' ; echo '' ; echo '<gallery>' ; echo 'Foo.jpg|foo' ; echo '</gallery>') > T239783.wt
$ php bin/parse.php --wt2html < T239783.wt > T239783.html
$ sed -e 's/>foo<\/p>/>bar<\/p>/' < T239783.html > T239783.2.html
$ php bin/parse.php --selser --oldtextfile T239783.wt --oldhtmlfile T239783.html < T239783.2.html
bar

<gallery>
Foo.jpg|foo
</gallery>

Maybe VE is doing something to the gallery content that's causing it to be marked as dirty?

Since this is a Parsoid/JS -> Parsoid/PHP regression, I think this is probably a porting issue.

It's not a regression, see above.

As far as I can tell, both Parsoid/JS and Parsoid/PHP behave identically: preserve the original (no prefix) form using selser, and add a "File:" prefix if gallery is edited.

The latter behavior may be undesirable on dewiki, but it is not a regression nor is it Parsoid/PHP-specific.

Reopening it temporarily to close this separately as a real Parsoid-PHP regression. This was not a duplicate.

ssastry assigned this task to Arlolra.

See T214649#5745042 for a discussion about which this was a real regression.

This was fixed by https://gerrit.wikimedia.org/r/#/c/558135/ and deployed to production on Dec 17, 2019.