Page MenuHomePhabricator

Tag untranslated translations units with lang and dir attributes
Closed, ResolvedPublic

Description

Current status

Related patches have been merged. Need to test after deployment.

Implemented the following alternatives to avoid breaking pages by not tagging things which appear in so called plain text context:

Alternatives exploration

GIVEUP: Do not introduce this feature

  • Needed for the parent
  • Nothing would break
  • No additional work needed
Possible solutions to avoid breakage while supporting tagging

S-MAGIC: (Try to) handle the edge cases

  • Likely to be brittle, needing to be updated as new cases are discovered
  • No additional markup, just "Do what I mean"

For example, we could try to detect such translation units using a regular expression like this: <[^<>]+ (title|alt)=['"]?XXX['"]? where XXX matches translation unit placeholder(s) in the template.

S-CONTROL: Add way to control wrapping per-unit basis

  • Easy to implement
  • More markup that translation admins need to understand

For example, we could have the following kind of markup: <abbr title="<translate plain>text</translate>">...</abbr>. Using a boolean attribute the additional markup is rather minimal. Possible names for this attribute are

  • (1) plain
  • (2) nowrap
Possible solutions for making migration easier

M-VARIABLE: Introduce a configuration variable

  • Easy to implement
  • Pages would still break at some point in absence of an alternative solution
  • Closed wikis could not be fixed, so either they break or the variable is kept forever

M-VERSIONING: Syntax version dependent rendering

  • Quite a bit of work to implement
  • Would need to keep support for both versions basically forever, but because it is in the code, this is nicer than having to keep an obscure configuration variable around forever
  • Allows moving pages gradually
  • This solution also works for closed wikis

Implicitly, all currently marked pages would be using version 1. Next time a new page or an existing translatable page is marked for translation, it would be forcefully switched to using version 2. When migrating an existing page, translation admins would see a notice on Special:PageTranslation that would point to a documentation and ask to check if the page has such issues.

Original report

The idea is if you have a partially translated page:

Translated text.

Untranslated text in source language.

This should be altered to:

Translated text.

<div lang=en dir=ltr>
Untranslated text in source language.
</div>

Inline units would use <span lang=en dir=ltr>Untranslated snippet</span>.

This has various benefits especially for accessibility.

QA plan

Affected projects: translatewiki.net, multilingual Wikimedia projects using Translate
Pre deployment: test on MLEB test instance:

Post deployment:

Potential breakage: tag-wrapping may break rendering of some pages with unexpected markup. Should keep close eye on reports.

Outcome

Translation page output is tagged correctly with regards to language and writing direction.

Related Objects

Event Timeline

Change 603471 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[mediawiki/extensions/Translate@master] Tag source language units with lang and dir attributes

https://gerrit.wikimedia.org/r/603471

abi_ triaged this task as Medium priority.Jun 18 2020, 6:14 AM

Change 603471 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Tag source language units with lang and dir attributes

https://gerrit.wikimedia.org/r/603471

Potential breakage: tag-wrapping may break rendering of some pages with unexpected markup. Should keep close eye on reports.

Not potential, guaranteed and sometimes unfixable breakage, for example when the <translate> tag is in an HTML attribute (like <span title="<translate>Longer desciprion</translate>">short</span>)—there was a time recently when I had to select Do not invalidate translations despite the English version having substantial changes, because it was in an HTML attribute, and fuzzying would have broken the page layout. On Commons, many templates have a translation unit with the template language, like {{Please name images}} before my today edit introducing a Lua-based hack instead of a translation unit-based hack. (The clean solution would be T224810.)

Sorry, this is a very important improvement in general that I wholeheartedly support, but this patch should not be deployed in this state. What could be deployed is a two-phase change:

  • Introduce an attribute (no-op in this phase) to <translate> that explicitly requests this wrapping to be disabled. (Disabling wrapping hurts accessibility, but completely breaking the wikitext hurts much more.) Announce this change to translator communities on every WMF wiki having Translate enabled, let them adapt. Maybe create a tool that automatically identifies potential breakage. (Using templates like in my above example complicates this a lot, so maybe it’s not worth it. It seems realistically doable only within the parser, after templates and parser tags are expanded, but before <translate> tags are processed.)
  • Deploy this change in a modified version that respects the new attribute introduced in the previous phase. Probably make this deploy gradual by creating a configuration variable. (The config variable also makes rollback as easy as changing one line in InitialiseSettings.php in case some wikis are terribly broken, while others are fine.)

Actual example of where the <translate> tag is in an HTML attribute: the translation unit that was changed by this edit. This will be broken if this patch is deployed.

Change 606852 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[mediawiki/extensions/Translate@master] Revert "Tag source language units with lang and dir attributes"

https://gerrit.wikimedia.org/r/606852

I sketched possible solutions in the task description. Personally I'm in favor of either S-MAGIC or S-CONTROL+M-VERSIONING.

Please comment which solution you prefer, or if you have other solutions in mind.

Nikerabbit renamed this task from Translate should tag units in fallback languages with correct language attributes to Tag untranslated translations units with lang and dir attributes.Jun 22 2020, 7:13 AM

Change 606852 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Revert "Tag source language units with lang and dir attributes"

https://gerrit.wikimedia.org/r/606852

I think S-MAGIC won’t work—the translation units can be in template parameters, in which case it’s almost impossible to guess what version should be used, but even if implemented, the template can change at any time, which would potentially cause an inadvertant change to translated pages (initiated by anyone, not only translation admins).

I have no strong opinion on the migration plan, but I think that if we go for M-VARIABLE, potential breakages could be listed on Special:LintErrors; if we go for M-VERSIONING, there could be a new special page listing v1 pages. In the latter case, pages submitted for translation right when the MediaWiki train arrives (form loaded before upgrade, submitted after upgrade) should also be taken care of. I don’t think we should care much about closed wikis, they are broken several ways anyway—Tidy/RemexHtml switch, deprecated JavaScript functions and CSS classes etc.

I have no idea how these could be detected on Special:LintErrors. Unless there is already a linter test for this kind of invalid wikitext/html, then it would already work for pages where broken rendering happens.

I wouldn't worry about pages being marked while the train runs. It's such a short timeframe, and the page would have to be one of those which has issues like this, and it could easily be fixed when noticed.

Probably a checkbox could be added that needs to be explicitly checked for the mark-for-translation to succeed. This would not only solve the issue during the train, but also require the translation admin to actively think about this issue—a simple warning can easily be forgotten due to banner blindness. (As this needs to be done only once per page, it hopefully won’t get too annoying.)

The ability to track the transition progress is quite important, so if it cannot be implemented with M-VARIABLE, than we should go for M-VERSIONING. Also an HTTP query parameter would be useful to control in which way to render the page (similar to useskinversion for the ongoing Vector refresh), so that translation admins can make sure which translation units need to be fixed. (Probably this parameter should break the original, untranslated page as well by inserting the markup for all translation units, as already translated units wouldn’t break in other languages and would go unnoticed.)

By the way, if this attribute goes live, it could control also whether fuzzy translations get the pink background—this would make it possible to mark translation units outdated (and thus making them have a notice on Special:Translate) without caring about breaking the layout.

Thinking about this a bit more, this change (except when no not exempted translation units are untranslated) involves changing the wikitext source code of the translations. In the M-VARIABLE case this means that either FuzzyBot goes over the thousands of pages after the change is deployed, flooding recent changes and watchlists, or the pages without markup remain for an indefinitely long time (until they are marked for translation for whatever reason). This makes M-VERSIONING even more superior, as that way the mark-for-translation is the thing that switches to the new system by design, not only accidentally.

So now I’m pretty sure that S-CONTROL and M-VERSIONING is the way to go.

The task description captures various options and their pro's and con's quite well. S-CONTROL and M-VERSIONING would be the ideal direction to move forward in. That gives translation admins, more control over how translations are rendered and might work well for scenarios that we have not thought of right now.

Chosen alternatives are
[…]

  • M-VARIABLE

Really? I think we agreed on M-VERSIONING, and the creation of T256868 suggests that as well.

Change 615674 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[mediawiki/extensions/Translate@master] (Re-)Enable page translation syntax version and connect with wrapping

https://gerrit.wikimedia.org/r/615674

Change 615674 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] (Re-)Enable page translation syntax version and connect with wrapping

https://gerrit.wikimedia.org/r/615674

6c86c78 has been merged and deployed, and I’ve been able to migrate a page without any issues (see FuzzyBot’s edit, the ambox contains a translation unit properly marked up, and another one that is correctly unmarked as instructed by the nowrap ). Is there anything left, or can the task be closed?

Tested on Mediawiki.org -

  1. Untranslated content was wrapped with dir and lang attributes
  2. If a translate tag has nowrap attribute, then its not wrapped with any elements.
  3. Saw a syntax version update for old pages marked for translation

Changes look good. Marking this as done.