Parsoid page views: need to do something about {{int:}}
Open, LowPublic
Actions

Description

Currently, {{int:}} is (ab)used to localize content, either directly by showing a localised message or by using e.g. MediaWiki:Lang to achieve conditional transclusion à-la {{sometemplate/{{int:lang}}}}. However, Parsoid always assumes content language for this parser function. This is generally not a problem for VisualEditor where it even makes sense to display a "canonical" version of text, but page views on multilingual projects like meta and commons are different because these sites rely heavily on user language to display content in it. A few examples:

File description pages, compare https://commons.wikimedia.org/wiki/File:Sao_Paulo_Railway.jpg?uselang=en vs. https://commons.wikimedia.org/wiki/File:Sao_Paulo_Railway.jpg?uselang=fr
Talk page templates, compare https://commons.wikimedia.org/wiki/Commons:Bistro?uselang=fr vs. https://commons.wikimedia.org/wiki/Commons:Bistro?uselang=en
Esoteric templates, e.g. https://commons.wikimedia.org/w/index.php?title=Template:Localized_link&action=edit or even in Lua: https://commons.wikimedia.org/wiki/Module:Fallback

We need to develop a strategy to mitigate thiese problems for Parsoid-based pageviews.

Related Objects

Mentioned In: T223772: Extend #time parser function to display time in format specific to each language
T272943: Make InputBox extension compatible with Parsoid
T4085: Add a {{USERLANGUAGE}} magic word
T106687: Flow does not support varying language of parts of content based on user interface language (e.g. {{int:}})
Mentioned Here: T309024: Make Parsoid's red link output compatible with current output
T266666: Localization html2html pass for Parsoid
T267059: Spec for precisely positioned, localized error message in Parsoid

Event Timeline

MaxSem created this task.Dec 31 2014, 1:41 AM

MaxSem raised the priority of this task from to Needs Triage.

MaxSem updated the task description. (Show Details)

MaxSem added a project: Parsoid.

MaxSem subscribed.

• GWicke added a project: Services.Dec 31 2014, 6:27 AM

• GWicke set Security to None.

Hmm .. interesting ... this could be tricky. How do we get user state into the parsing because without user state, it will return the default lang.

[subbu@earth lib] echo "{{int:lang}}" | node parse --normalize --dump tplsrc --prefix commonswiki
=================================
pf_int
---------------------------------
en
---------------------------------

<p>en</p>

ssastry triaged this task as High priority.Dec 31 2014, 10:29 PM

@ssastry We could pass a language in as a parameter (and then forward that to expandtemplates), but would also need to store that separately, effectively replicating the current parser cache fragmentation.

Some of the use cases for {{int:}} like commons metadata will probably go away soonish with the move to wikidata. The remaining ones seem to be typically individual transclusions, often really small ones (like the 'thanks' template).

I think it's worth thinking about moving more of the remaining work to the client. We could compile a list of such elements in each page after parse, and then deliver translations to be applied client-side to users who requested it. This should be a good performance win if translations are small relative to the overall page size. It would reduce the per-language storage needs on the server to fragments rather than entire pages. It might even be possible to share such fragments between pages if we can establish that the output does not depend on the actual page name.

In any case, I think the current result of returning the content language is tolerable in the short term. There are bigger issues to tackle right now.

ssastry lowered the priority of this task from High to Medium.Feb 3 2015, 9:55 PM

ssastry merged a task: T72215: {{int:..}} usages should render in user language instead of content language.

ssastry added subscribers: Mooeypoo, siebrand, dchan and 2 others.

In T85581#951449, @GWicke wrote:

I think it's worth thinking about moving more of the remaining work to the client. We could compile a list of such elements in each page after parse, and then deliver translations to be applied client-side to users who requested it. This should be a good performance win if translations are small relative to the overall page size. It would reduce the per-language storage needs on the server to fragments rather than entire pages. It might even be possible to share such fragments between pages if we can establish that the output does not depend on the actual page name.

However, this will not work when {{int}} is used in esoteric templates to control transclusion.

Nemo_bis added a project: I18n.Mar 5 2015, 3:59 PM

Nemo_bis subscribed.

• Pchelolo moved this task from Backlog to watching on the Services board.Oct 12 2016, 7:24 PM

• Pchelolo edited projects, added Services (watching); removed Services.

ssastry moved this task from Needs Triage to Read Views on the Parsoid board.Jan 11 2018, 9:49 PM

Liuxinyu970226 subscribed.Feb 25 2018, 3:00 AM

Reedy edited projects, added Parsoid-Read-Views-Deprecated-Project; removed Parsoid.Sep 17 2018, 7:25 PM

• mobrovac added a project: Platform Team Legacy (Watching / External).Dec 20 2018, 12:02 PM

Aklapper edited projects, added Parsoid; removed Parsoid-Read-Views-Deprecated-Project.Feb 29 2020, 5:14 PM

Aklapper added a project: Parsoid-Rendering.Feb 29 2020, 6:14 PM

ssastry moved this task from Read Views to Missing Functionality on the Parsoid board.Mar 6 2020, 10:14 PM

LGoto lowered the priority of this task from Medium to Low.Jun 25 2020, 6:11 PM

cscott mentioned this in T106687: Flow does not support varying language of parts of content based on user interface language (e.g. {{int:}}).Aug 26 2020, 6:00 PM

cscott mentioned this in T4085: Add a {{USERLANGUAGE}} magic word.Aug 26 2020, 6:02 PM

Nikerabbit subscribed.Aug 27 2020, 7:57 AM

ssastry added a project: Parsoid-Read-Views (Phase 3 - Main namespace of officewiki / mediawiki.org renders with Parsoid).Sep 29 2021, 10:30 PM

I think the story for Parsoid will be some general purpose markup for 'translatable content' in the main parsoid edit views path, and then a postprocessing step to do the per-user localization. This is still missing functionality, but T267059: Spec for precisely positioned, localized error message in Parsoid and T266666 have laid some of the groundwork.

In T85581#951449, @GWicke wrote:

Some of the use cases for {{int:}} like commons metadata will probably go away soonish with the move to wikidata. The remaining ones seem to be typically individual transclusions, often really small ones (like the 'thanks' template).

It doesn’t mean the problem would also go away. Compare https://commons.wikimedia.org/w/index.php?title=Symphyotrichum_novi-belgii&uselang=la and https://commons.wikimedia.org/w/index.php?title=Symphyotrichum_novi-belgii&uselang=la&veaction=edit – “Basionymum” vs “Basionym”. The text is read from d:Q810198, yet it changes from German to English when I open the page in VisualEditor (i.e. switch from legacy parser output to Parsoid output).

In T85581#7567938, @cscott wrote:

I think the story for Parsoid will be some general purpose markup for 'translatable content' in the main parsoid edit views path, and then a postprocessing step to do the per-user localization.

Would that mean that the cache would contain all translations in some encoded form? It’d blow up. {{Assessments}} (used on the above file description page), for example, is translated into 69 languages. The description page of this particular file is probably never viewed in the majority of those languages, so currently those languages are not cached. However, if you want them to be present in an intermediate representation, they will be cached.

@ihurbain this is probably related to the Internationalization work done for T309024: Make Parsoid's red link output compatible with current output.

@Tacsipacsi see https://www.mediawiki.org/wiki/Specs/HTML/2.6.0#Internationalization

In T85581#8245592, @cscott wrote:

@Tacsipacsi see https://www.mediawiki.org/wiki/Specs/HTML/2.6.0#Internationalization

Thanks for the link! However, there are two issues with this spec for the translatable template use case:

It specifies how to set the MediaWiki message to be used for translation. However, a translatable template is not a MediaWiki message, for a reason: MediaWiki messages don’t work the wiki way – only admins and other privileged groups can edit them. This is a necessary precaution in case of interface elements, as they’re highly visible, and local editing to them is usually unnecessary anyway thanks to Translatewiki. Translatable templates, in contrast, are much less visible, so the risk is lower, but they can’t be edited on Translatewiki, so the need is higher.
It doesn’t specify the level of wikitext markup support within the messages to be used, or how to handle recursive i18n. The second example shows that simple HTML tags (<code> in this case) should be okay in placeholder elements, but what about more complicated constructs, like parser functions/tags, magic words or templates? Translatable templates usually build on other, often quite complicated, templates/modules. What if a translatable message itself contains something translatable? (grep -F '{{int:' languages/i18n/en.json | wc -l gives 32, and this is just MediaWiki core. This pattern is used every now and then in extensions as well; and it’s used all the time in translatable templates. In templates, often the inner template is translated into the user language, but the outer one falls back to English. While this may lead to mixed-language output, it makes sure that as much content is available in the user’s language as possible, so IMO it should be kept this way.) What if this translatable message containing a reference to another translatable message happens to be in an attribute? <span>s should not normally occur in attributes, but this use case should be supported. What if a parameter is, or contains, another translatable message? This is probably even more common when working with MediaWiki messages (i.e. not in the template world) than messages directly containing references to other messages.

Handling your second question first, and picking one of the results from grep -F '{{int:' languages/i18n/en.json at random:

"resetpass-expired-soft": "Your password has expired and needs to be changed. Please choose a new password now, or click \"{{int:authprovider-resetpass-skip-label}}\" to change it later.",

There isn't recursive expansion here. The Parsoid output for the wikitext:

{{int:resetpass-expired-soft}}

is/will be (roughly, there are some slight tweaks coming in how the first argument of parser functions such as int are represented gerrit):

<p>
 <span about="#mwt1" typeof="mw:Transclusion" data-mw='{"parts":[{"template":{"target":{"wt":"int:resetpass-expired-soft","function":"int"},"params":{},"i":0}}]}'>
  <span typeof="mw:I18n" data-mw-i18n='{"/":{"lang":"x-user","key":"resetpass-expired-soft","params":[]}}' />
 </span>
</p>

That is, a span wrapper representing the parser function invocation, containing an empty span representing "a localized message" which is not expanded.

This is the user-independent output which is stored in the parser cache.

Later, when the content is taken out of the parser cache and postprocessed with a specific user/user language, we get:

<p>
 <span about="#mwt1" typeof="mw:Transclusion" data-mw='{"parts":[{"template":{"target":{"wt":"int:resetpass-expired-soft","function":"int"},"params":{},"i":0}}]}'>
  <span typeof="mw:I18n" data-mw-i18n='{"/":{"lang":"x-user","key":"resetpass-expired-soft","params":[]}}'>
    Your password has expired and needs to be changed. Please choose a new password now, or click \"Skip\" to change it later."
  </span>
 </span>
</p>

That is, we expand the UX message completely before substituting it for the span. This is similar to what we do for templates, which are similarly flattened when expanded in Parsoid output.

This is partly an artifact of how Parsoid interacts with the legacy parser, which is still used for message parsing and template expansion. In the future (when using Parsoid natively for message parsing) we might not flatten the output before embedding, but there's no real issue introduced by that:

<p>
 <span about="#mwt1" typeof="mw:Transclusion" data-mw='{"parts":[{"template":{"target":{"wt":"int:resetpass-expired-soft","function":"int"},"params":{},"i":0}}]}'>
  <span typeof="mw:I18n" data-mw-i18n='{"/":{"lang":"x-user","key":"resetpass-expired-soft","params":[]}}'>
    Your password has expired and needs to be changed. Please choose a new password now, or click \"
      <span typeof="mw:I18n" data-mw-i18n='{"/":{"lang":"x-user","key":"authprovider-resetpass-skip-label","params":[]}}'>
         Skip
       </span>
      \" to change it later."
  </span>
 </span>
</p>

The issue of "wikitext occurring inside parameters" is also similar to how that issue is handled in templates. In general we treat parameters as raw strings, and don't represent the "parsed" form of the parameter in the HTML. Visual Editor may know that some parameters are "really" wikitext, and can provide a rich editing widget for them; the {{int}} parser function will be given type hints in a similar way (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/TemplateData/+/819740).

For the other question, I'm not entirely sure what you mean by "translatable template". Could you elaborate?

Bugreporter mentioned this in T223772: Extend #time parser function to display time in format specific to each language.Jul 11 2024, 6:43 AM

Parsoid page views: need to do something about {{int:}}Open, LowPublicActions

Description

Related Objects

Event Timeline

Parsoid page views: need to do something about {{int:}}
Open, LowPublic
Actions