Page MenuHomePhabricator

Confirming a fuzzied translation should be logged
Open, MediumPublic2 Estimated Story PointsFeature

Description

With the new system (without !!FUZZY!!), it's impossible to tell that a user has confirmed a fuzzied translation, because there's no edit to the translation and there's no log. This produces mysterious summaryless diffs to translation page when all goes well and nothing at all when something goes wrong or it's not page translation (with all sorts of problems for attribution and understandable histories), see T48716#512417.

A log entry should be added.

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 22 2014, 1:33 AM
bzimport set Reference to bz47177.
bzimport added a subscriber: Unknown Object (MLST).

This also causes https://gerrit.wikimedia.org/r/#/c/66574/1/AccountAudit.i18n.php where doing a full export of an extension finally removed the # fuzzy comment, even though the actual unfuzzy had possibly taken place a long time ago.

Two expected behaviors based on this issue:
I. Marking unfuzzy should make the message group qualify for export if it's a file based message group.
II. Marking unfuzzy should be visible in the page history of the translation.

Nikerabbit raised the priority of this task from Lowest to Medium.Apr 20 2016, 10:32 AM
Nikerabbit updated the task description. (Show Details)
Nikerabbit removed a subscriber: wikibugs-l-list.

Support this, since just looking at diffs makes finding the relevant unit quite annoyingly hard, and blatantly incorrect taggings are not uncommon. Maybe this could even come with an easy way to revert the marking.

Nikerabbit changed the subtype of this task from "Task" to "Feature Request".

At mediawiki.org there are lots of anons confirming fuzzied translations in bad faith, every day, and there's no easy way to revert them. I have to manually copy part of the text that was unwrapped from the span tags, and search it in the translation page to manually add the !!FUZZY!! text. This is very time consuming.

Currently it seems impossible to prevent confirmation of fuzzied translations that make no changes. I introduced a filter to prevent anons from doing this, but it doesn't work because the automatic change in the translated page is not affected by AbuseFilter (which makes sense).

A radical idea: what about using multi-content revisions? A separate slot could store what revision of the original message the translation is based on (i.e. if I translate revision 1234 of MediaWiki:Foo/en or Translations:Manual:Bar/en, the slot would contain 1234). This would resolve both the visibility and the resolvability issue:

  • Fuzzy means that the revision ID stored in the slot is different from the current revision ID of the original message. Confirming a translation means that the slot storing the revision ID is updated. This changes the page content, which naturally causes a new revision, so this appears in the page history, RC etc.
  • Reverting to the above edit means that the revision ID stored in the slot becomes outdated again, causing the translation to be fuzzy. This not only means that there’s no need to manually add !!FUZZY!!, but also that the ID of the old revision is retained, causing the diff in Tux to continue to work.

However, as quite radical idea, it also has some potential issues:

  • It probably needs a major refactoring, which can introduce bugs.
  • It makes parts of the code more complicated. However, since the fuzziness information is already stored somewhere (in a separate database table?), other parts of the code could become simpler.
  • Will the existing interfaces (Tux, WikiEditor) continue to work? Since Commons already has multi-content revisions in the file namespace, and WikiEditor didn’t break, WikiEditor should probably be okay.
  • What if someone sets the revision ID to a random number through the API? Is it possible? Can we prevent it? Do we want to prevent it? (After migrating a page to the Translate extension, it may come handy for translation admins to be able to indicate that the just-imported text is actually a translation of an older version – of course, for this to work, the page needs to be marked for translation twice. On the other hand, setting the ID to completely random numbers makes no sense.)
  • Introducing this requires adding this piece of information to all existing translation units, which probably means billions of edits on both WMF wikis and translatewiki.net. Since this way too slow to include in the standard update.php maintenance script, a longer transition period is necessary, during which both the old database table and the slot is read, (temporarily) further complicating the code.

If we go this way, there are some further possibilities to improve/simplify the system:

  • In Tux and maybe in WikiEditor as well, there could be a checkbox when editing a fuzzy translation using which the user could indicate whether the revision ID should be updated. For example, if a translation is fuzzy, but the translator just quickly fixes a typo, without reviewing the diff, they could uncheck it.
  • If we allow manually changing the revision ID, the !!FUZZY!! syntax could be finally entirely deprecated, with two replacements:
    • If the revision the translation is based on is known, one could simply set the revision ID slot to that.
    • If the revision is not known or doesn’t exist (or the problem is not that the translation is outdated, but e.g. that half of it isn’t translated at all), a special value of zero could be used. Similarly to the current !!FUZZY!! syntax, this would highlight the translation as fuzzy, but not provide any diff.
    • Of course, if !!FUZZY!! is replaced, there should be some UI to change the revision ID slot. This could be e.g. an extra input field in the WikiEditor interface (it’s probably not commonly needed enough to be included in Tux).

Source revision id (for diffs) is tracked separately from fuzzy status. This seems conflated in your suggestion.

MCR might make sense for the fuzzy flag, though there are some immediately obvious things that would need to be resolved:

  • Creating an UI (without Special:Translate) to edit the fuzzy flag
  • Performance. We created separate table to create fuzzy status to improve speed, both for querying and updating. Storing fuzzy status with MCR would double the amount of content lookups, which is already slow. Fuzzying when source changes would become slow again, requiring editing each translation (which is super slow), as opposed to updating a separate dedicated database table.

Source revision id (for diffs) is tracked separately from fuzzy status. This seems conflated in your suggestion.

I am aware that the source revision ID and the fuzzy status are not the same; I assume, however, that translations where a diff is shown (and therefore the source revision ID is important) are a (proper) subset of fuzzy translations. Based on this assumption, I proposed to treat translations where the source revision ID is outdated as fuzzy, as well as any translations where the source revision ID is set to the special value of zero (these are the cases that make the subset proper).

Storing fuzzy status with MCR would double the amount of content lookups, which is already slow.

Why? You query the content of a page at once, don’t you?

Fuzzying when source changes would become slow again, requiring editing each translation (which is super slow), as opposed to updating a separate dedicated database table.

In the exact setup I proposed, these edits wouldn’t be necessary – not editing a page would automatically fuzzy it.

Neither info is subset of another. Outdated messages are not necessarily fuzzy (e.g. when skipping fuzzying for spelling fixes in the source) nor are fuzzy messages necessarily outdated (validation errors, manual fuzzying).

I don't know on top of my head if multiple slots can be loaded simultaneously, so maybe the impact on querying would not be so high.

Outdated messages are not necessarily fuzzy (e.g. when skipping fuzzying for spelling fixes in the source)

Oh, right. However, as a translator, I would actually appreciate having to edit pages to fuzzy them – the edits would appear on my watchlist, so I would know that the translation needs to be updated. Currently, there’s no notification at all, leading to fuzzy translations not being updated until someone notices them, sometimes months or years later.

Change #1035601 had a related patch set uploaded (by Pppery; author: Pppery):

[mediawiki/extensions/Translate@master] Add log entry and permission for unfuzzying translations

https://gerrit.wikimedia.org/r/1035601

Speaking of User-notice: maybe it would be the best to announce the ability to disable unfuzzying for anons in Tech News, but not actually disable it anywhere unless communities request it. (The logging part seems to be uncontroversial, so that can be deployed right away.)

Honestly I suspect every community with translate will want to disable this, but I'm fine with making that explicit first if needed.

My patch annoyingly won't fix the variation on the standard attack where a fuzzy translation is vandalized and the vandalism is reverted by recent changes patrollers, as the unfuzzy will just be attributed to them instead.

Honestly I suspect every community with translate will want to disable this, but I'm fine with making that explicit first if needed.

This setting is against the wiki concept by excluding certain groups from certain tasks (without all individuals in the group being known or suspected vandals). If the amount of vandalism/test edits is too high, it may be worth it (just like page protection may be worth it), but if there isn’t much vandalism/test edits, it’s just an unnecessary restriction, which may drive away people who want to help.

My patch annoyingly won't fix the variation on the standard attack where a fuzzy translation is vandalized and the vandalism is reverted by recent changes patrollers, as the unfuzzy will just be attributed to them instead.

Good point. You could skip unfuzzying if $editResult->getRevertMethod() === EditResult::REVERT_UNDO || $editResult->getRevertMethod() === REVERT_ROLLBACK. (You shouldn’t use $editResult->isRevert() because manual reverts are often changes that do need to unfuzzy.)

just like page protection may be worth it

ftr, that does not work on translation pages.

ftr, that does not work on translation pages.

But it does work on many other pages, so I think it works as an example/analogy.

! In T49177#9834118, @Tacsipacsi wrote:
Good point. You could skip unfuzzying if $editResult->getRevertMethod() === EditResult::REVERT_UNDO || $editResult->getRevertMethod() === REVERT_ROLLBACK. (You shouldn’t use $editResult->isRevert() because manual reverts are often changes that do need to unfuzzy.)

I was thinking "a revert, by any means, of an edit by someone who doesn't have the unfuzzy permission doesn't unfuzzy" -> there have been several occasions in which I thought about about using the undo bit to unfuzzy a unit (most recently when cleaning up this mess - I decided to do manual reverts there instead because the people who I was reverting didn't need to get revert notifications)

This should be split to a separate task, though.

Change #1035601 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] Add log entry and require unfuzzy permission for unfuzzying translation

https://gerrit.wikimedia.org/r/1035601

For Tech News, do you have a suggested summary we could use? (After reading the comments above, I'm still uncertain about which aspects need to be announced, and what kind of community decisions/feedback needs to be encouraged...?).
My best-guess is something like this (but I suspect it contains inaccuracies, and I hope someone can explain it more simply/concisely!):

Changes later this week

  • On multilingual wikis that use the <translate> system, there is a feature that shows potentially-outdated translations with a pink background and marks them as "fuzzy". From this week, changes to the "fuzzy" status will be logged, and there is a new user-right that can be required for confirming translations, if the community requests it.

That looks surprisingly accurate. Except that "confirming translations" and "changes to the 'fuzzy' status" are two names for the same thing, so you should use one term for them.

Thank you! I made a few tweaks, and have now added it to https://meta.wikimedia.org/wiki/Tech/News/2024/24 -- If any further tweaks are needed, please edit it directly there. I will be freezing it for translations in ~2 hours.

Bugs found on translatewiki.net:

The second issue is a clear bug someone should submit a patch for, and I may do. I have no idea what's going on with the first issue.

{{ec}}

Something weird is going on: on https://translatewiki.net/w/i.php?title=MediaWiki:Pt-movepage-logreason/hi&action=history, the (cur) link of the null revision is a link pointing to the the diff between that edit and the previous one (with the previous edit being the “after” side), but the (cur) link of the previous edit is not a link, as if MediaWiki thought that the latest revision is the one before the null revision.

I suggest fixing this ASAP in Wikimedia production and translatewiki.net, either by making sure the links appear at the right places, or by commenting out the null revision part (the logging and user right parts can remain), and in either case backporting the changes to wmf.9.

At least the issue is self-healing with the next edit to the page.

Patch coming soon for both issues,

At least the issue is self-healing with the next edit to the page.

However, translation units are usually not highly edited pages; if a translation is rightfully unfuzzied (so the unfuzzying doesn’t get reverted), the same page may not be edited – and thus may remain broken – for years.

Patch coming soon for both issues,

Thanks in advance!

Change #1041755 had a related patch set uploaded (by Pppery; author: Pppery):

[mediawiki/core@master] Add docuementation saying to avoid the trap I fell into

https://gerrit.wikimedia.org/r/1041755

Change #1041756 had a related patch set uploaded (by Pppery; author: Pppery):

[mediawiki/extensions/Translate@master] Fix logging bugs in unfuzzy handling

https://gerrit.wikimedia.org/r/1041756

Change #1041755 merged by jenkins-bot:

[mediawiki/core@master] RevisionStore: Add documentation saying to avoid the trap I fell into

https://gerrit.wikimedia.org/r/1041755