Page MenuHomePhabricator

Confirming a fuzzied translation should be logged
Open, MediumPublicFeature

Description

With the new system (without !!FUZZY!!), it's impossible to tell that a user has confirmed a fuzzied translation, because there's no edit to the translation and there's no log. This produces mysterious summaryless diffs to translation page when all goes well and nothing at all when something goes wrong or it's not page translation (with all sorts of problems for attribution and understandable histories), see T48716#512417.

A log entry should be added.

Details

Reference
bz47177

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 22 2014, 1:33 AM
bzimport set Reference to bz47177.
bzimport added a subscriber: Unknown Object (MLST).

This also causes https://gerrit.wikimedia.org/r/#/c/66574/1/AccountAudit.i18n.php where doing a full export of an extension finally removed the # fuzzy comment, even though the actual unfuzzy had possibly taken place a long time ago.

Two expected behaviors based on this issue:
I. Marking unfuzzy should make the message group qualify for export if it's a file based message group.
II. Marking unfuzzy should be visible in the page history of the translation.

Nikerabbit raised the priority of this task from Lowest to Medium.Apr 20 2016, 10:32 AM
Nikerabbit updated the task description. (Show Details)
Nikerabbit removed a subscriber: wikibugs-l-list.

Support this, since just looking at diffs makes finding the relevant unit quite annoyingly hard, and blatantly incorrect taggings are not uncommon. Maybe this could even come with an easy way to revert the marking.

Nikerabbit changed the subtype of this task from "Task" to "Feature Request".

At mediawiki.org there are lots of anons confirming fuzzied translations in bad faith, every day, and there's no easy way to revert them. I have to manually copy part of the text that was unwrapped from the span tags, and search it in the translation page to manually add the !!FUZZY!! text. This is very time consuming.

Currently it seems impossible to prevent confirmation of fuzzied translations that make no changes. I introduced a filter to prevent anons from doing this, but it doesn't work because the automatic change in the translated page is not affected by AbuseFilter (which makes sense).

A radical idea: what about using multi-content revisions? A separate slot could store what revision of the original message the translation is based on (i.e. if I translate revision 1234 of MediaWiki:Foo/en or Translations:Manual:Bar/en, the slot would contain 1234). This would resolve both the visibility and the resolvability issue:

  • Fuzzy means that the revision ID stored in the slot is different from the current revision ID of the original message. Confirming a translation means that the slot storing the revision ID is updated. This changes the page content, which naturally causes a new revision, so this appears in the page history, RC etc.
  • Reverting to the above edit means that the revision ID stored in the slot becomes outdated again, causing the translation to be fuzzy. This not only means that there’s no need to manually add !!FUZZY!!, but also that the ID of the old revision is retained, causing the diff in Tux to continue to work.

However, as quite radical idea, it also has some potential issues:

  • It probably needs a major refactoring, which can introduce bugs.
  • It makes parts of the code more complicated. However, since the fuzziness information is already stored somewhere (in a separate database table?), other parts of the code could become simpler.
  • Will the existing interfaces (Tux, WikiEditor) continue to work? Since Commons already has multi-content revisions in the file namespace, and WikiEditor didn’t break, WikiEditor should probably be okay.
  • What if someone sets the revision ID to a random number through the API? Is it possible? Can we prevent it? Do we want to prevent it? (After migrating a page to the Translate extension, it may come handy for translation admins to be able to indicate that the just-imported text is actually a translation of an older version – of course, for this to work, the page needs to be marked for translation twice. On the other hand, setting the ID to completely random numbers makes no sense.)
  • Introducing this requires adding this piece of information to all existing translation units, which probably means billions of edits on both WMF wikis and translatewiki.net. Since this way too slow to include in the standard update.php maintenance script, a longer transition period is necessary, during which both the old database table and the slot is read, (temporarily) further complicating the code.

If we go this way, there are some further possibilities to improve/simplify the system:

  • In Tux and maybe in WikiEditor as well, there could be a checkbox when editing a fuzzy translation using which the user could indicate whether the revision ID should be updated. For example, if a translation is fuzzy, but the translator just quickly fixes a typo, without reviewing the diff, they could uncheck it.
  • If we allow manually changing the revision ID, the !!FUZZY!! syntax could be finally entirely deprecated, with two replacements:
    • If the revision the translation is based on is known, one could simply set the revision ID slot to that.
    • If the revision is not known or doesn’t exist (or the problem is not that the translation is outdated, but e.g. that half of it isn’t translated at all), a special value of zero could be used. Similarly to the current !!FUZZY!! syntax, this would highlight the translation as fuzzy, but not provide any diff.
    • Of course, if !!FUZZY!! is replaced, there should be some UI to change the revision ID slot. This could be e.g. an extra input field in the WikiEditor interface (it’s probably not commonly needed enough to be included in Tux).

Source revision id (for diffs) is tracked separately from fuzzy status. This seems conflated in your suggestion.

MCR might make sense for the fuzzy flag, though there are some immediately obvious things that would need to be resolved:

  • Creating an UI (without Special:Translate) to edit the fuzzy flag
  • Performance. We created separate table to create fuzzy status to improve speed, both for querying and updating. Storing fuzzy status with MCR would double the amount of content lookups, which is already slow. Fuzzying when source changes would become slow again, requiring editing each translation (which is super slow), as opposed to updating a separate dedicated database table.

Source revision id (for diffs) is tracked separately from fuzzy status. This seems conflated in your suggestion.

I am aware that the source revision ID and the fuzzy status are not the same; I assume, however, that translations where a diff is shown (and therefore the source revision ID is important) are a (proper) subset of fuzzy translations. Based on this assumption, I proposed to treat translations where the source revision ID is outdated as fuzzy, as well as any translations where the source revision ID is set to the special value of zero (these are the cases that make the subset proper).

Storing fuzzy status with MCR would double the amount of content lookups, which is already slow.

Why? You query the content of a page at once, don’t you?

Fuzzying when source changes would become slow again, requiring editing each translation (which is super slow), as opposed to updating a separate dedicated database table.

In the exact setup I proposed, these edits wouldn’t be necessary – not editing a page would automatically fuzzy it.

Neither info is subset of another. Outdated messages are not necessarily fuzzy (e.g. when skipping fuzzying for spelling fixes in the source) nor are fuzzy messages necessarily outdated (validation errors, manual fuzzying).

I don't know on top of my head if multiple slots can be loaded simultaneously, so maybe the impact on querying would not be so high.

Outdated messages are not necessarily fuzzy (e.g. when skipping fuzzying for spelling fixes in the source)

Oh, right. However, as a translator, I would actually appreciate having to edit pages to fuzzy them – the edits would appear on my watchlist, so I would know that the translation needs to be updated. Currently, there’s no notification at all, leading to fuzzy translations not being updated until someone notices them, sometimes months or years later.