Page MenuHomePhabricator

Templates (and extensions) that mimic parser media output need migration to new structure
Open, MediumPublic

Description

In order to be able to stop shipping the older styles in content.thumbnails-*.less, content that's mimicking the parser media output needs to be migrated to the new structure.

However, the styles in content.media-*.less are targeted towards typeof="mw:File" annotations, which might not make sense semantically for that content (and would likely be sanitized away). Perhaps we need to add a class in the stylesheet for this content to target instead? But note that, in T314097, it was discouraged to add a classes that MediaWiki core isn't generating:

I don't think it makes sense to put a class here that's not generated by MediaWiki core however. At the very least not without a huge inline comment explaining why.

From https://gerrit.wikimedia.org/r/c/mediawiki/core/+/812384/7#message-a49f26cd09eb71d8f23b950b91f63359a938e370


Templates can also produce legacy DOM, expecting the skin to style it (e.g. hu:Sablon:Széles kép), and/or consume the DOM using TemplateStyles. These also need to be checked, and I think they could be in the scope of this ticket.


This particular example should be unaffected since, for the moment, we aren't removing the old styling rules. But, yes, getting all the on wiki content that mimics the parser output ported to the new styling rules will be a longer term goal.

https://en.wikipedia.org/wiki/Template:Multiple_image looks like another one


.mw-kartographer-container.thumb:not(.mw-kartographer-full) .thumbinner > .mw-kartographer-map {

In T272186, we looked at extensions that needed updating for the new structure, but Kartographer wasn't one of them. It looks like it's doing something similar to what some templates do (see T297447#8242965) and tries to mimic the legacy media structure that the parser outputs, to reuse the styling,
https://github.com/wikimedia/mediawiki-extensions-Kartographer/blob/master/includes/Tag/MapFrame.php#L191
https://github.com/wikimedia/mediawiki-extensions-Kartographer/blob/master/styles/kartographer.less#L117-L158

That should continue to work, since we aren't yet removing those styles, but should eventually maybe be migrated the new styles and structure.

EDIT: from T318433#10397735 below:

We discussed this task during MW engineering offsite. Proposed solution, inspired by T204370, is to add a new parser function {{#media}}, which in its full form will be a replacement for [[File:]] syntax, but with cleaner option handling (eg, requiring caption=.... instead of parsing any unrecognized option as the caption). {{#media|src=Foo.jpg}} would be the equivalent of [[File:Foo.jpg]] but instead of src you could pass content=... as an option to embed arbitrary user-generated content, using the structure described above in:

<figure typeof="mw:Transclusion mw:File" ...>
  <a href="..."><span typeof="mw:UserContent">...parsed user wikitext...</span></a>
  <figcaption typeof="mw:UserContent">...parsed user wikitext...</figcaption>
</figure>

That would enforce the expected DOM structure on the user content and allow the reuse of typeof=mw:File in a safe (parser-generated typeof, not user-controlled) way. And, of course, if we ever made future tweaks to the media styling or DOM structure, we can make them in a single place in the {{#figure}} implementation and have them applied uniformly to user-generated content.

(We probably don't actually need typeof=mw:UserContent on the figcaption, since the caption is always user-generated content; this isn't new in any way. See T331655 for a fuller discussion of marking extension-generated content.)

The implementation should call an Parsoid API in order to share as much code as possible with the existing Parsoid image handling code.

Event Timeline

Arlolra triaged this task as Medium priority.Sep 23 2022, 5:17 PM
Arlolra moved this task from Needs Triage to Media Structure on the Parsoid board.

Tangentially (or maybe not), editors should have access to <figure> and <figcaption> in wikitext per my comment in T25932#7070297.

Tangentially (or maybe not), editors should have access to <figure> and <figcaption> in wikitext per my comment in T25932#7070297.

That's a separate debate, one on which I think there is a wide gap in consensus.

I think it's reasonable to ensure that the CSS rules have "class based" equivalents to the figure styling, eg figure[typeof=...], .enwiki-figure { ... }. I don't think I agree that that means we need to emit redundant class definitions in Parsoid HTML.

I think "a huge inline comment explaining why" (as mentioned in the task description) is sufficient, to wit: some template authors want their templates to be styled in the same way as mediawiki figures, and this allows them to do so. That seems straightforward enough.

I think it's reasonable to ensure that the CSS rules have "class based" equivalents to the figure styling, eg figure[typeof=...], .enwiki-figure { ... }. I don't think I agree that that means we need to emit redundant class definitions in Parsoid HTML.

Some further points made to that effect:

figure[typeof=....], .mw-figure-xyz { ... } is cheap enough, has a minimal impact on performance, and allows future decoupling of parser output styling and template styling (such as in exactly the case we're discussing here).

it's also a one-time download of the highly-cacheable CSS, instead of bloating every figure in every article

the mw-figure-blahblah strings should also be very searchable, if we ever need to lint that content

Change 865213 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] [WIP] Bikeshed some classes to mimic the parser output

https://gerrit.wikimedia.org/r/865213

Tangentially (or maybe not), editors should have access to <figure> and <figcaption> in wikitext per my comment in T25932#7070297.

That's a separate debate, one on which I think there is a wide gap in consensus.

Without figure/figcaption literals though, templates won't actually be able to mimic the parser output for the block form. We're back to div soup,

<div class="enwiki-figure">
  <a href=""><img src="" /></a>
  <div class="enwiki-figcaption"></div>
</div>

Change 888127 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] [WIP] Change the default of wgParserEnableLegacyMediaDOM to false

https://gerrit.wikimedia.org/r/888127

Tangentially (or maybe not), editors should have access to <figure> and <figcaption> in wikitext per my comment in T25932#7070297.

That's a separate debate, one on which I think there is a wide gap in consensus.

Without figure/figcaption literals though, templates won't actually be able to mimic the parser output for the block form. We're back to div soup,

<div class="enwiki-figure">
  <a href=""><img src="" /></a>
  <div class="enwiki-figcaption"></div>
</div>

It may be that what we want is something like {{#figure|link=<link>|caption=<caption>|content=<content>}} (either as a built in or as an extension) which emits "the right figure structure" while allowing user content in the caption and in the "media" spot. The actual HTML representation would contain parsoid's standard tagging for "extension-generated content" as well as tagging for "user content" (cf https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/831077 for T309024; and T331655). For example:

<figure typeof="mw:Transclusion mw:File" ...>
  <a href="..."><span typeof="mw:UserContent">...parsed user wikitext...</span></a>
  <figcaption typeof="mw:UserContent">...parsed user wikitext...</figcaption>
</figure>

That would enforce the expected DOM structure on the user content and allow the reuse of typeof=mw:File in a safe (parser-generated typeof, not user-controlled) way. And, of course, if we ever made future tweaks to the media styling or DOM structure, we can make them in a single place in the {{#figure}} implementation and have them applied uniformly to user-generated content.

Change 888127 merged by jenkins-bot:

[mediawiki/core@master] Change the default of wgParserEnableLegacyMediaDOM to false

https://gerrit.wikimedia.org/r/888127

Change 901294 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@REL1_40] Change the default of wgParserEnableLegacyMediaDOM to false

https://gerrit.wikimedia.org/r/901294

Change 901294 merged by jenkins-bot:

[mediawiki/core@REL1_40] Change the default of wgParserEnableLegacyMediaDOM to false

https://gerrit.wikimedia.org/r/901294

Change 936799 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] [WIP] Add a config to stop shipping the legacy media styles

https://gerrit.wikimedia.org/r/936799

Change 937544 had a related patch set uploaded (by Arlolra; author: Arlolra):

[operations/mediawiki-config@master] Set default for UseLegacyMediaStyles and disable on officewiki

https://gerrit.wikimedia.org/r/937544

Change 936799 merged by jenkins-bot:

[mediawiki/core@master] Add a config to stop shipping the legacy media styles

https://gerrit.wikimedia.org/r/936799

Change 937544 merged by jenkins-bot:

[operations/mediawiki-config@master] Set default for UseLegacyMediaStyles and disable on officewiki

https://gerrit.wikimedia.org/r/937544

Mentioned in SAL (#wikimedia-operations) [2023-07-13T20:08:52Z] <taavi@deploy1002> Started scap: Backport for [[gerrit:937544|Set default for UseLegacyMediaStyles and disable on officewiki (T318433)]]

Mentioned in SAL (#wikimedia-operations) [2023-07-13T20:10:19Z] <taavi@deploy1002> taavi and arlolra: Backport for [[gerrit:937544|Set default for UseLegacyMediaStyles and disable on officewiki (T318433)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-07-13T20:16:39Z] <taavi@deploy1002> Finished scap: Backport for [[gerrit:937544|Set default for UseLegacyMediaStyles and disable on officewiki (T318433)]] (duration: 07m 47s)

Change 939374 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[operations/mediawiki-config@master] Fix incorrect use of UseLegacyMediaStyles (missing "wg" prefix)

https://gerrit.wikimedia.org/r/939374

Change 939374 merged by jenkins-bot:

[operations/mediawiki-config@master] Fix incorrect use of UseLegacyMediaStyles (missing "wg" prefix)

https://gerrit.wikimedia.org/r/939374

Mentioned in SAL (#wikimedia-operations) [2023-07-19T13:02:52Z] <lucaswerkmeister-wmde@deploy1002> Started scap: Backport for [[gerrit:939374|Fix incorrect use of UseLegacyMediaStyles (missing "wg" prefix) (T318433)]]

Mentioned in SAL (#wikimedia-operations) [2023-07-19T13:04:27Z] <lucaswerkmeister-wmde@deploy1002> ssastry and lucaswerkmeister-wmde: Backport for [[gerrit:939374|Fix incorrect use of UseLegacyMediaStyles (missing "wg" prefix) (T318433)]] synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-07-19T13:13:40Z] <lucaswerkmeister-wmde@deploy1002> Finished scap: Backport for [[gerrit:939374|Fix incorrect use of UseLegacyMediaStyles (missing "wg" prefix) (T318433)]] (duration: 10m 47s)

@ABreault-WMF can we get officewiki converted so we can finally turn UseLegacyMediaStyles off for officewiki (and deprecate and remove that configuration)?

Change #1102220 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Deprecate no-longer-needed $wgUseLegacyMediaStyles

https://gerrit.wikimedia.org/r/1102220

@ABreault-WMF can we get officewiki converted so we can finally turn UseLegacyMediaStyles off for officewiki (and deprecate and remove that configuration)?

It's been disabled on officewiki since the beginning T318433#9014093. The goal is to turn it off everywhere else, which isn't possible until this task is resolved. Maybe you're thinking about ParserEnableLegacyMediaDOM and UseContentMediaStyles, which are already marked as deprecated,
https://github.com/wikimedia/mediawiki/blob/master/includes/MainConfigSchema.php#L6586-L6597

Discussed this task with @Arlolra during MW engineering offsite. Proposed solution, inspired by T204370, is to add a new parser function {{#media}}, which in its full form will be a replacement for [[File:]] syntax, but with cleaner option handling (eg, requiring caption=.... instead of parsing any unrecognized option as the caption). {{#media|src=Foo.jpg}} would be the equivalent of [[File:Foo.jpg]] but instead of src you could pass content=... as an option to embed arbitrary user-generated content, using the structure described above in:

<figure typeof="mw:Transclusion mw:File" ...>
  <a href="..."><span typeof="mw:UserContent">...parsed user wikitext...</span></a>
  <figcaption typeof="mw:UserContent">...parsed user wikitext...</figcaption>
</figure>

That would enforce the expected DOM structure on the user content and allow the reuse of typeof=mw:File in a safe (parser-generated typeof, not user-controlled) way. And, of course, if we ever made future tweaks to the media styling or DOM structure, we can make them in a single place in the {{#figure}} implementation and have them applied uniformly to user-generated content.

(We probably don't actually need typeof=mw:UserContent on the figcaption, since the caption is always user-generated content; this isn't new in any way.)

The implementation should call an Parsoid API in order to share as much code as possible with the existing Parsoid image handling code.

Sounds good for templates. How do you expect to serve extensions? Parsing the content after wrapping it in {{#media}} won’t work as extensions may want to output things not allowed in wikitext (e.g. Kartographer uses an image with an arbitrary URL, possibly wrapped in <noscript>). Maybe that Parsoid API could be made available to extensions?

Sounds good for templates. How do you expect to serve extensions? Parsing the content after wrapping it in {{#media}} won’t work as extensions may want to output things not allowed in wikitext (e.g. Kartographer uses an image with an arbitrary URL, possibly wrapped in <noscript>). Maybe that Parsoid API could be made available to extensions?

Extensions would presumably generate {{#media:<strip marker>}}, which is the standard way of embedding "things not allowed in wikitext" inside the wikitext result of an extension of parser function.

That’s indeed possible, but feels hackish in cases where the only wikitext bit would be this parser function call, and the extensions work with HTML afterwards, not wikitext. For example, Kartographer (rEKAR includes/Tag/LegacyMapFrame.php:66 (at d01b9e7b07a5)) returns HTML that’s exactly the thumbnail, with no wikitext before or after it. So it’d call $parser->halfParseWikitext("{{#media|content=$stripMarkerContent|caption=$stripMarkerCaption}}") (the caption also needs to be parsed separately, or else we’d run the risk of breaking badly if the caption is unbalanced wikitext), and then substitute the strip markers. Isn’t something like Parser::media(content: $thumbnail, caption: $captionHtml) much simpler and cleaner?