Page MenuHomePhabricator

Edited gallery captions are ignored unless gallery's data-mw is dropped
Closed, ResolvedPublic

Description

The wikitext

<gallery>File:Example.jpg|thumb|''Hello''</gallery>

converts to

<ul ... data-mw='{"name":"gallery","attrs":{},"body":{"extsrc":"File:Example.jpg|thumb|&apos;&apos;Hello&apos;&apos;"}}'>
    <li>
        <div ...><span ...><a ...><img ... /></a></span></div>
        <div class="gallerytext"><i>Hello</i></div>
    </li>
</ul>

Note the caption data exists as both wikitext in data-mw.body.extsrc and html in the gallerytext div.

Parsoid prefers to use the data-mw wikitext when converting back to wikitext. This means we would need to detect a caption being edited for the first, and generate an attribute transaction on the parent gallery, which is very messy.

Event Timeline

Native gallery editing isn't enabled in production,
https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/CommonSettings.php#L4266

The FIXME here says we were waiting on VE to enable HTML editing of galleries,
https://github.com/wikimedia/mediawiki-services-parsoid/blob/master/src/Ext/Gallery/Gallery.php#L314

In T214649, the output VE was returning differed enough that selser never applied to any gallery.

The FIXME here says we were waiting on VE to enable HTML editing of galleries,
https://github.com/wikimedia/mediawiki-services-parsoid/blob/master/src/Ext/Gallery/Gallery.php#L314

Right now, the heuristic exists that if the extsrc is removed, Parsoid will serialize from the edited html. We could drop the caption from the data-mw if the extsrc is absent. That's pragmatic and sort of implied by,
https://github.com/wikimedia/mediawiki-services-parsoid/blob/master/src/Ext/Gallery/Gallery.php#L316-L321

But what we really want to do is set nativeGalleryEnabled so that we stop emitting extsrc altogether but that requires addressing the selser issues.

Up to you what you'd prefer us to do.

We could drop the caption from the data-mw if the extsrc is absent. That's pragmatic and sort of implied by,

This refers specifically to the caption on the extension tag, <gallery caption="123" ..., and not the captions of the individual files.

Change 889257 had a related patch set uploaded (by Arlolra; author: Arlolra):

[operations/mediawiki-config@master] [WIP] Enabled native gallery editing in Parsoid

https://gerrit.wikimedia.org/r/889257

Arlolra triaged this task as Medium priority.Feb 14 2023, 11:42 PM
Arlolra moved this task from Backlog to In Progress on the Content-Transform-Team-WIP board.

In T214649, the output VE was returning differed enough that selser never applied to any gallery.

It looks like @matmarex resolved the differences in the wrapper in T214649#5887430. I'll do some more testing but we can probably just enable this production.

Please do test with,

$wgParsoidSettings = [
	'nativeGalleryEnabled' => true,
];

to confirm that it meets your editing needs.

I'll do some more testing but we can probably just enable this production.

Meh, a quick test reveals there are still a bunch of modified-wrapper, but in the nested structure,

----- DOM after running DOMDiff -----
<body id="mwAA" lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output" dir="ltr" data-parsoid-diff='{"id":15580374,"diff":["children-changed"]}' data-parsoid='{"dsr":[0,81,0,0]}'><h2 id="123" data-parsoid-diff='{"id":15580374,"diff":["modified-wrapper","children-changed","subtree-changed"]}' data-parsoid='{"dsr":[0,7,2,2]}' data-mw="{}"><meta typeof="mw:DiffMarker/deleted"/>1234</h2>

<ul typeof="mw:Extension/gallery" class="gallery mw-gallery-traditional" about="#mwt2" id="mwAw" data-parsoid-diff='{"id":15580374,"diff":["subtree-changed"]}' data-parsoid='{"dsr":[9,81,23,10]}' data-mw='{"name":"gallery","attrs":{},"body":{}}'>
<li class="gallerycaption" id="mwBA" data-parsoid="{}" data-mw="{}">123</li>
<li class="gallerybox" style="width: 155px;" id="mwBQ" data-parsoid-diff='{"id":15580374,"diff":["children-changed","subtree-changed"]}' data-parsoid="{}" data-mw="{}"><div class="thumb" data-parsoid-diff='{"id":15580374,"diff":["modified-wrapper","children-changed","subtree-changed"]}' data-parsoid="{}" data-mw="{}"><span typeof="mw:File" data-parsoid-diff='{"id":15580374,"diff":["modified-wrapper","children-changed","subtree-changed"]}' data-parsoid="{}" data-mw="{}"><a data-parsoid-diff='{"id":15580374,"diff":["modified-wrapper","children-changed","subtree-changed"]}' data-parsoid="{}" data-mw="{}"><img resource="./File:Unix history-simple.png" src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/50/Unix_history-simple.png/120px-Unix_history-simple.png" width="120" height="80" data-parsoid-diff='{"id":15580374,"diff":["modified-wrapper"]}' data-parsoid="{}" data-mw="{}"/></a></span></div><div class="gallerytext" data-parsoid-diff='{"id":15580374,"diff":["modified-wrapper"]}' data-parsoid="{}" data-mw="{}">testing</div></li>
</ul>

</body>
-------------------------------------

Looking at VE source,
https://github.com/wikimedia/mediawiki-extensions-VisualEditor/blob/master/modules/ve-mw/dm/nodes/ve.dm.MWGalleryImageNode.js#L135-L140

	// TODO: Support editing the link
	// FIXME: Dropping the href causes Parsoid to mark the node as wrapper modified,
	// making the whole gallery subtree edited, preventing selser.  When fixing,
	// preserving the imgWrapperClassAttr, as in the MW*ImageNodes, will also be
	// necessary.
	// a.setAttribute( 'href', attributes.src );

We can ensure all the DOM attributes are copied by setting ve.dm.MWGalleryImageNode.static.handlesOwnChildren to true, making that above FIXME redundant.

Change 889586 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/extensions/VisualEditor@master] [WIP] Ensure all DOM attributes are copied for MWGalleryImageNode

https://gerrit.wikimedia.org/r/889586

Change 889586 abandoned by Arlolra:

[mediawiki/extensions/VisualEditor@master] [WIP] Ensure all DOM attributes are copied for MWGalleryImageNode

Reason:

In favour of I2fdfc04aee5e8b3524214e25afd8443b5f6b240d

https://gerrit.wikimedia.org/r/889586

But what we really want to do is set nativeGalleryEnabled so that we stop emitting extsrc altogether but that requires addressing the selser issues.

I should confirm that flag also ignores extsrc that may be set in cached content from before it was enabled

So the state of play is that we need Ed's patches deployed, and then we can make the config change to turn on native galleries. My understanding is that we've shipped nativeGalleriesEnabled=>true as the default configuration for "quite a while" even though it's not enabled in production everywhere quite yet.

Change 891889 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] [WIP] Ignore gallery extsrc if native gallery is enabled

https://gerrit.wikimedia.org/r/891889

I should confirm that flag also ignores extsrc that may be set in cached content from before it was enabled

That's in T329662#8644807

Change 891889 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Ignore gallery extsrc if native gallery is enabled

https://gerrit.wikimedia.org/r/891889

Change 894720 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@master] Bump parsoid to 0.17.0-a19

https://gerrit.wikimedia.org/r/894720

Change 894720 merged by jenkins-bot:

[mediawiki/vendor@master] Bump parsoid to 0.17.0-a19

https://gerrit.wikimedia.org/r/894720

Change 896435 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] [WIP] Gallery selser

https://gerrit.wikimedia.org/r/896435

Change 896435 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Reuse source in galleries

https://gerrit.wikimedia.org/r/896435

Change 901245 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@master] Bump parsoid to 0.18.0-a2

https://gerrit.wikimedia.org/r/901245

Change 901245 merged by jenkins-bot:

[mediawiki/vendor@master] Bump parsoid to 0.18.0-a2

https://gerrit.wikimedia.org/r/901245

Following up on T329662#8616990, an edit elsewhere in the page now produces (after I2fdfc04aee5e8b3524214e25afd8443b5f6b240d) pretty good results with nativeGalleryEnabled,

----- DOM after running DOMDiff -----
<body id="mwAA" lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output" dir="ltr" data-parsoid-diff='{"diff":["children-changed"]}' data-parsoid="{}"><h2 id="1234" data-parsoid-diff='{"diff":["modified-wrapper","children-changed","subtree-changed"]}' data-parsoid="{}" data-mw="{}"><meta typeof="mw:DiffMarker/deleted"/>12345</h2>

<ul typeof="mw:Extension/gallery" class="gallery mw-gallery-traditional" about="#mwt2" id="mwAw" data-parsoid="{}" data-mw='{"name":"gallery","attrs":{},"body":{}}'>
<li class="gallerybox" style="width: 155px;" id="mwBA" data-parsoid="{}" data-mw="{}"><div class="thumb" style="width: 150px; height: 150px;" id="mwBQ" data-parsoid="{}" data-mw="{}"><span typeof="mw:File" id="mwBg" data-parsoid="{}" data-mw="{}"><a href="./File:Test.png" class="mw-file-description" title="123" id="mwBw" data-parsoid="{}" data-mw="{}"><img resource="./File:Test.png" src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/6a/PNG_Test.png/95px-PNG_Test.png" width="95" height="120" alt="123" decoding="async" data-file-width="3120" data-file-height="3920" data-file-type="bitmap" srcset="https://upload.wikimedia.org/wikipedia/commons/thumb/6/6a/PNG_Test.png/143px-PNG_Test.png 1.5x, https://upload.wikimedia.org/wikipedia/commons/thumb/6/6a/PNG_Test.png/191px-PNG_Test.png 2x" id="mwCA" data-parsoid="{}" data-mw="{}"/></a></span></div><div class="gallerytext" id="mwCQ" data-parsoid="{}" data-mw="{}">123</div></li>
</ul>

</body>
-------------------------------------

I think this means we can move forward with T329662#8633332. I'll do that tomorrow.

Unfortunately, the benefits of T329662#8684251 and I55942bbb36c3bd3dd43a665fbca0f275e9870f7c are not felt because of an assortment of issues,

----- DOM after running DOMDiff -----
<body id="mwAA" lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output" dir="ltr" data-parsoid-diff='{"diff":["children-changed"]}' data-parsoid="{}"><h2 id="1234" data-parsoid-diff='{"diff":["modified-wrapper"]}' data-parsoid="{}" data-mw="{}">1234</h2>

<ul typeof="mw:Extension/gallery" class="gallery mw-gallery-traditional" about="#mwt2" id="mwAw" data-parsoid-diff='{"diff":["children-changed","subtree-changed"]}' data-parsoid="{}" data-mw='{"name":"gallery","attrs":{},"body":{}}'><meta typeof="mw:DiffMarker/deleted"/><li class="gallerybox" data-parsoid-diff='{"diff":["modified-wrapper","children-changed","subtree-changed"]}' data-parsoid="{}" data-mw="{}"><div class="thumb" data-parsoid-diff='{"diff":["modified-wrapper","children-changed","subtree-changed"]}' data-parsoid="{}" data-mw="{}"><span typeof="mw:File" data-parsoid-diff='{"diff":["modified-wrapper","children-changed","subtree-changed"]}' data-parsoid="{}" data-mw="{}"><a href="./File:Test.png" class="mw-file-description" data-parsoid-diff='{"diff":["modified-wrapper","children-changed","subtree-changed"]}' data-parsoid="{}" data-mw="{}"><img resource="./File:Test.png" src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/6a/PNG_Test.png/95px-PNG_Test.png" width="95" height="120" alt="123" data-parsoid-diff='{"diff":["modified-wrapper"]}' data-parsoid="{}" data-mw="{}"/></a></span></div><div class="gallerytext" data-parsoid-diff='{"diff":["modified-wrapper"]}' data-parsoid="{}" data-mw="{}">123</div></li><meta typeof="mw:DiffMarker/deleted"/><li class="gallerybox" data-parsoid-diff='{"diff":["inserted"]}' data-parsoid="{}" data-mw="{}"><div class="thumb" data-parsoid="{}"><span typeof="mw:File" data-parsoid="{}"><a href="./File:Odd Rode Church - geograph.org.uk - 2261064.jpg" class="mw-file-description" data-parsoid="{}"><img resource="./File:Odd Rode Church - geograph.org.uk - 2261064.jpg" src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/89/Odd_Rode_Church_-_geograph.org.uk_-_2261064.jpg/200px-Odd_Rode_Church_-_geograph.org.uk_-_2261064.jpg" width="120" height="100" alt="" data-parsoid="{}"/></a></span></div><div class="gallerytext" data-parsoid="{}"></div></li></ul>

</body>
-------------------------------------

None of the internal id attributes are preserved, so data-parsoid is lost. The style attributes on the li and div are dropped. The newlines before each li are removed, so we get <meta typeof="mw:DiffMarker/deleted"/>. The img doesn't have decoding / data-* attributes. Etc.

Change 889257 merged by jenkins-bot:

[operations/mediawiki-config@master] Enabled native gallery editing in Parsoid

https://gerrit.wikimedia.org/r/889257

Mentioned in SAL (#wikimedia-operations) [2023-03-29T13:19:57Z] <lucaswerkmeister-wmde@deploy2002> Started scap: Backport for [[gerrit:889257|Enabled native gallery editing in Parsoid (T329662)]]

Mentioned in SAL (#wikimedia-operations) [2023-03-29T13:21:19Z] <lucaswerkmeister-wmde@deploy2002> lucaswerkmeister-wmde and arlolra: Backport for [[gerrit:889257|Enabled native gallery editing in Parsoid (T329662)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-03-29T13:30:17Z] <lucaswerkmeister-wmde@deploy2002> Finished scap: Backport for [[gerrit:889257|Enabled native gallery editing in Parsoid (T329662)]] (duration: 10m 19s)

I think this is done now. Thank you!

@matmarex Should I open a separate task for T329662#8736728? This task is done but T207620 / T211895 / T151367 would be improved.

@Arlolra I'm not really sure what it means. If it's not covered by these existing tasks, then please do file one.

Change 904199 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] Enabled native gallery editing in Parsoid by default

https://gerrit.wikimedia.org/r/904199

@Arlolra I'm not really sure what it means. If it's not covered by these existing tasks, then please do file one.

Sorry, what it means is that the changes done in I2fdfc04aee5e8b3524214e25afd8443b5f6b240d make it so that selser works for unedited galleries on a page, which is great.

In Id0e3c5a79f74d72257c6e70d97e55ed94410f156 though, selser can now be applied to unedited nodes in an edited gallery. So, for example, adding a new image to a gallery won't normalize the rest of the gallery lines, as in T211895. But this doesn't work because the returned edited gallery is heavy normalized by VE, as noted in the latter half of T329662#8736728.

Similarly, even if selser isn't applied in the edited gallery, not having the ids on old nodes means that data-parsoid is dropped for them and so any serialization information (like file prefixes, I55942bbb36c3bd3dd43a665fbca0f275e9870f7c) won't be able to be used.

Change 904199 merged by jenkins-bot:

[mediawiki/core@master] Enabled native gallery editing in Parsoid by default

https://gerrit.wikimedia.org/r/904199