Page MenuHomePhabricator

ParsoidPHP adds "File:" in <gallery> element to every line
Closed, ResolvedPublic

Description

It looks like the new ParsoidPHP adds in the <gallery> element a leading "File:" (or "Datei:" in German Wikipedia) to every line, Example. First time reported on the German Wikipedia on Dec 3rd some hours after the switch.

Event Timeline

Raymond created this task.Dec 4 2019, 7:51 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 4 2019, 7:51 AM

The localized version is inserted, e.g. Datei: or Fichier:.

  • Localization is blocking C&P transfer into other language wikis.
  • German Wikipedia is removing the namespace name by JS tools when source code is edited to enable smooth exchange with other wikis.

The namespace name is redundant and superfluous in gallery and has been a less smart concept there.

  • Nothing else than a media ID is permitted at this position, and therefore it is pointless to ask whether it is called Image: or File:.

VE etc. must not insert any namespace name.

  • It is cluttering diff and disabling interwiki exchange. At least confusing, no benefit.
cscott added a subscriber: cscott.EditedDec 4 2019, 2:32 PM

Reproducible:

$ (echo '<gallery>' ; echo 'Foo.jpg|foo' ; echo '</gallery>') | php bin/parse.php --wt2wt
<gallery>
File:Foo.jpg|foo
</gallery>

But this behavior is in Parsoid/JS as well:

$ (echo '<gallery>' ; echo 'Foo.jpg|foo' ; echo '</gallery>') | bin/parse.js --wt2wt
<gallery>
File:Foo.jpg|foo
</gallery>

And this behavior was present in Parsoid/JS at least as far back as Oct 29 / 42f33caad4f954a3cdea09301472e42bc8478ce2.

Possibly what's going on is a selser failure?

ssastry triaged this task as High priority.Dec 4 2019, 2:32 PM
ssastry added a subscriber: ssastry.

Thanks for reporting. We'll take a look at this right away.

ssastry edited projects, added Parsoid-PHP; removed Parsoid.Dec 4 2019, 2:48 PM
ssastry moved this task from Backlog to Bugs, Notices, Crashers on the Parsoid-PHP board.
cscott added a comment.Dec 4 2019, 2:57 PM

Hm, selser looks fine:

$ (echo 'foo' ; echo '' ; echo '<gallery>' ; echo 'Foo.jpg|foo' ; echo '</gallery>') > T239783.wt
$ php bin/parse.php --wt2html < T239783.wt > T239783.html
$ sed -e 's/>foo<\/p>/>bar<\/p>/' < T239783.html > T239783.2.html
$ php bin/parse.php --selser --oldtextfile T239783.wt --oldhtmlfile T239783.html < T239783.2.html
bar

<gallery>
Foo.jpg|foo
</gallery>

Maybe VE is doing something to the gallery content that's causing it to be marked as dirty?

Hm, selser looks fine:

$ (echo 'foo' ; echo '' ; echo '<gallery>' ; echo 'Foo.jpg|foo' ; echo '</gallery>') > T239783.wt
$ php bin/parse.php --wt2html < T239783.wt > T239783.html
$ sed -e 's/>foo<\/p>/>bar<\/p>/' < T239783.html > T239783.2.html
$ php bin/parse.php --selser --oldtextfile T239783.wt --oldhtmlfile T239783.html < T239783.2.html
bar

<gallery>
Foo.jpg|foo
</gallery>

Maybe VE is doing something to the gallery content that's causing it to be marked as dirty?

Since this is a Parsoid/JS -> Parsoid/PHP regression, I think this is probably a porting issue.

cscott added a comment.EditedDec 4 2019, 4:36 PM

It's not a regression, see above.

As far as I can tell, both Parsoid/JS and Parsoid/PHP behave identically: preserve the original (no prefix) form using selser, and add a "File:" prefix if gallery is edited.

The latter behavior may be undesirable on dewiki, but it is not a regression nor is it Parsoid/PHP-specific.

ssastry reopened this task as Open.Jan 8 2020, 5:42 AM

Reopening it temporarily to close this separately as a real Parsoid-PHP regression. This was not a duplicate.

ssastry closed this task as Resolved.Jan 8 2020, 5:45 AM
ssastry assigned this task to Arlolra.

See T214649#5745042 for a discussion about which this was a real regression.

This was fixed by https://gerrit.wikimedia.org/r/#/c/558135/ and deployed to production on Dec 17, 2019.