Page MenuHomePhabricator

Parsoid page views: need to do something about {{int:}}
Open, LowPublic

Description

Currently, {{int:}} is (ab)used to localize content, either directly by showing a localised message or by using e.g. MediaWiki:Lang to achieve conditional transclusion à-la {{sometemplate/{{int:lang}}}}. However, Parsoid always assumes content language for this parser function. This is generally not a problem for VisualEditor where it even makes sense to display a "canonical" version of text, but page views on multilingual projects like meta and commons are different because these sites rely heavily on user language to display content in it. A few examples:

We need to develop a strategy to mitigate thiese problems for Parsoid-based pageviews.

Event Timeline

MaxSem raised the priority of this task from to Needs Triage.
MaxSem updated the task description. (Show Details)
MaxSem added a project: Parsoid.
MaxSem subscribed.

Hmm .. interesting ... this could be tricky. How do we get user state into the parsing because without user state, it will return the default lang.

[subbu@earth lib] echo "{{int:lang}}" | node parse --normalize --dump tplsrc --prefix commonswiki
=================================
pf_int
---------------------------------
en
---------------------------------

<p>en</p>

@ssastry We could pass a language in as a parameter (and then forward that to expandtemplates), but would also need to store that separately, effectively replicating the current parser cache fragmentation.

Some of the use cases for {{int:}} like commons metadata will probably go away soonish with the move to wikidata. The remaining ones seem to be typically individual transclusions, often really small ones (like the 'thanks' template).

I think it's worth thinking about moving more of the remaining work to the client. We could compile a list of such elements in each page after parse, and then deliver translations to be applied client-side to users who requested it. This should be a good performance win if translations are small relative to the overall page size. It would reduce the per-language storage needs on the server to fragments rather than entire pages. It might even be possible to share such fragments between pages if we can establish that the output does not depend on the actual page name.

In any case, I think the current result of returning the content language is tolerable in the short term. There are bigger issues to tackle right now.

I think it's worth thinking about moving more of the remaining work to the client. We could compile a list of such elements in each page after parse, and then deliver translations to be applied client-side to users who requested it. This should be a good performance win if translations are small relative to the overall page size. It would reduce the per-language storage needs on the server to fragments rather than entire pages. It might even be possible to share such fragments between pages if we can establish that the output does not depend on the actual page name.

However, this will not work when {{int}} is used in esoteric templates to control transclusion.

LGoto lowered the priority of this task from Medium to Low.Jun 25 2020, 6:11 PM
cscott subscribed.

I think the story for Parsoid will be some general purpose markup for 'translatable content' in the main parsoid edit views path, and then a postprocessing step to do the per-user localization. This is still missing functionality, but T267059: Spec for precisely positioned, localized error message in Parsoid and T266666 have laid some of the groundwork.

Some of the use cases for {{int:}} like commons metadata will probably go away soonish with the move to wikidata. The remaining ones seem to be typically individual transclusions, often really small ones (like the 'thanks' template).

It doesn’t mean the problem would also go away. Compare https://commons.wikimedia.org/w/index.php?title=Symphyotrichum_novi-belgii&uselang=la and https://commons.wikimedia.org/w/index.php?title=Symphyotrichum_novi-belgii&uselang=la&veaction=edit – “Basionymum” vs “Basionym”. The text is read from d:Q810198, yet it changes from German to English when I open the page in VisualEditor (i.e. switch from legacy parser output to Parsoid output).

I think the story for Parsoid will be some general purpose markup for 'translatable content' in the main parsoid edit views path, and then a postprocessing step to do the per-user localization.

Would that mean that the cache would contain all translations in some encoded form? It’d blow up. {{Assessments}} (used on the above file description page), for example, is translated into 69 languages. The description page of this particular file is probably never viewed in the majority of those languages, so currently those languages are not cached. However, if you want them to be present in an intermediate representation, they will be cached.

Thanks for the link! However, there are two issues with this spec for the translatable template use case:

  • It specifies how to set the MediaWiki message to be used for translation. However, a translatable template is not a MediaWiki message, for a reason: MediaWiki messages don’t work the wiki way – only admins and other privileged groups can edit them. This is a necessary precaution in case of interface elements, as they’re highly visible, and local editing to them is usually unnecessary anyway thanks to Translatewiki. Translatable templates, in contrast, are much less visible, so the risk is lower, but they can’t be edited on Translatewiki, so the need is higher.
  • It doesn’t specify the level of wikitext markup support within the messages to be used, or how to handle recursive i18n. The second example shows that simple HTML tags (<code> in this case) should be okay in placeholder elements, but what about more complicated constructs, like parser functions/tags, magic words or templates? Translatable templates usually build on other, often quite complicated, templates/modules. What if a translatable message itself contains something translatable? (grep -F '{{int:' languages/i18n/en.json | wc -l gives 32, and this is just MediaWiki core. This pattern is used every now and then in extensions as well; and it’s used all the time in translatable templates. In templates, often the inner template is translated into the user language, but the outer one falls back to English. While this may lead to mixed-language output, it makes sure that as much content is available in the user’s language as possible, so IMO it should be kept this way.) What if this translatable message containing a reference to another translatable message happens to be in an attribute? <span>s should not normally occur in attributes, but this use case should be supported. What if a parameter is, or contains, another translatable message? This is probably even more common when working with MediaWiki messages (i.e. not in the template world) than messages directly containing references to other messages.

Handling your second question first, and picking one of the results from grep -F '{{int:' languages/i18n/en.json at random:

"resetpass-expired-soft": "Your password has expired and needs to be changed. Please choose a new password now, or click \"{{int:authprovider-resetpass-skip-label}}\" to change it later.",

There isn't recursive expansion here. The Parsoid output for the wikitext:

{{int:resetpass-expired-soft}}

is/will be (roughly, there are some slight tweaks coming in how the first argument of parser functions such as int are represented gerrit):

<p>
 <span about="#mwt1" typeof="mw:Transclusion" data-mw='{"parts":[{"template":{"target":{"wt":"int:resetpass-expired-soft","function":"int"},"params":{},"i":0}}]}'>
  <span typeof="mw:I18n" data-mw-i18n='{"/":{"lang":"x-user","key":"resetpass-expired-soft","params":[]}}' />
 </span>
</p>

That is, a span wrapper representing the parser function invocation, containing an empty span representing "a localized message" which is not expanded.

This is the user-independent output which is stored in the parser cache.

Later, when the content is taken out of the parser cache and postprocessed with a specific user/user language, we get:

<p>
 <span about="#mwt1" typeof="mw:Transclusion" data-mw='{"parts":[{"template":{"target":{"wt":"int:resetpass-expired-soft","function":"int"},"params":{},"i":0}}]}'>
  <span typeof="mw:I18n" data-mw-i18n='{"/":{"lang":"x-user","key":"resetpass-expired-soft","params":[]}}'>
    Your password has expired and needs to be changed. Please choose a new password now, or click \"Skip\" to change it later."
  </span>
 </span>
</p>

That is, we expand the UX message completely before substituting it for the span. This is similar to what we do for templates, which are similarly flattened when expanded in Parsoid output.

This is partly an artifact of how Parsoid interacts with the legacy parser, which is still used for message parsing and template expansion. In the future (when using Parsoid natively for message parsing) we might not flatten the output before embedding, but there's no real issue introduced by that:

<p>
 <span about="#mwt1" typeof="mw:Transclusion" data-mw='{"parts":[{"template":{"target":{"wt":"int:resetpass-expired-soft","function":"int"},"params":{},"i":0}}]}'>
  <span typeof="mw:I18n" data-mw-i18n='{"/":{"lang":"x-user","key":"resetpass-expired-soft","params":[]}}'>
    Your password has expired and needs to be changed. Please choose a new password now, or click \"
      <span typeof="mw:I18n" data-mw-i18n='{"/":{"lang":"x-user","key":"authprovider-resetpass-skip-label","params":[]}}'>
         Skip
       </span>
      \" to change it later."
  </span>
 </span>
</p>

The issue of "wikitext occurring inside parameters" is also similar to how that issue is handled in templates. In general we treat parameters as raw strings, and don't represent the "parsed" form of the parameter in the HTML. Visual Editor may know that some parameters are "really" wikitext, and can provide a rich editing widget for them; the {{int}} parser function will be given type hints in a similar way (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/TemplateData/+/819740).

For the other question, I'm not entirely sure what you mean by "translatable template". Could you elaborate?