Page MenuHomePhabricator

Increase the percentage of allowed unchanged text for Content Translation tool in SqWiki to 90%
Closed, ResolvedPublicFeature

Description

Currently you can't publish a translated page with CX if you haven't changed the automatically translated text enough to pass a certain threshold. This is to protect against poor automatic translations. However Google Translate has improved a lot lately and what was once a helpful feature has turned into a problematic one now because we struggle a lot to hit that threshold. Some users have started purposely butchering some aspects of the text just so they hit the limit and then solve what they ruined after being allowed to publish the edit. In this situation we require for the threshold to increase to whatever is now (I'm assuming it is 85%) to 90%, hoping this will be a more normal limit for the updated translations Google Translate is able to provide nowadays.

This is the link for the discussion and voting.

Event Timeline

Change 925574 had a related patch set uploaded (by Klein Muçi; author: Klein Muçi):

[operations/mediawiki-config@master] Content Translation: Set MT threshold to 90% for Albanian WP

https://gerrit.wikimedia.org/r/925574

Trying to help, based on past similar tickets, I've already created a patch for this (which may as well be wrong).

The default threshold value for publishing for machine translation is 95%, if you set it to 90% - it will be much stricter.

@KartikMistry, yes, I was seeing that now while working on the code (why I wrote that my patch may be wrong). This was rather unexpected as I was told from some manual tests that my community performed that the default threshold was 85%. Do you think it would be a meaningful change if we set it to 97%? Or maybe we should aim even higher?

@KartikMistry, yes, I was seeing that now while working on the code (why I wrote that my patch may be wrong). This was rather unexpected as I was told from some manual tests that my community performed that the default threshold was 85%. Do you think it would be a meaningful change if we set it to 97%? Or maybe we should aim even higher?

@Pginer-WMF Should we set MT threshold limit to 97% or 99%?

https://www.mediawiki.org/wiki/Help:Content_translation/Translating/Translation_quality has details about translation quality and adjusting MT threshold.

The percentage of user modifications is also measured for each paragraph. A paragraph is considered problematic when it contains more than 85% of the initial machine translation (or, when copying the contents from the source document, it contains more than 60% of unmodified content).

@KartikMistry, apparently this is what was bringing the confusion. Most of our new translations in WikiWorkshops consist of creating newly translated stub articles so that new users can learn how to utilize the tool. In these scenarios, only the first top paragraph gets translated and users are expected to publish their edit (thus creating a stub article). They can't do that because of the said restriction. This is making me think that the overall limit is fine on itself, we should just increase the paragraph one.

A less strict threshold is considered for paragraphs that a user marks as resolved—taken as a signal that the user reviewed and confirmed the status of the translation. For paragraphs where the unmodified content warning is shown, but the user marks it as resolved, a less strict threshold is applied (accepting 95% of Machine translation or 75% of source content). This will provide a way to accommodate cases where the automatic translation was exceptionally good, but still avoid potential abuse of the feature (i.e., not blindly following a user's confirmation).

I also saw this now but I'm not sure if this is in effect with us. As far as I remember from my personal experience, I've seen users doing some changes to the translated paragraph and then marking its problems as resolved but still not being able to publish if the percentage was close to 85%. Not to mention the fact that most new users are scared to override default actions and mark things as resolved when the warning is telling them otherwise.

Also in the link above it is specifically written Prepare to iterate. Wouldn't it be better if the adjusting process was done on-wiki so changes were done by the local communities themselves and thus the right ones were applied faster? Apparently the whole CX tool utilizes more variables that I was aware of which further instills the need for swifter change processes.

Change 925574 abandoned by Klein Muçi:

[operations/mediawiki-config@master] Content Translation: Set MT threshold to 90% for Albanian WP

Reason:

This actually makes the problem that is trying to address worse.

https://gerrit.wikimedia.org/r/925574

Klein changed the task status from Open to Stalled.Jun 5 2023, 8:42 AM
Nikerabbit renamed this task from Increase the percentage of allowed unchanged text for Content Translation Toool in SqWiki to 90% to Increase the percentage of allowed unchanged text for Content Translation tool in SqWiki to 90%.Jun 19 2023, 11:12 AM

@Pginer-WMF Should we set MT threshold limit to 97% or 99%?

I think that we can adjust the limit to 97%, observe the effects for a while, and decide if more adjustments are needed.

The percentage of user modifications is also measured for each paragraph. A paragraph is considered problematic when it contains more than 85% of the initial machine translation (or, when copying the contents from the source document, it contains more than 60% of unmodified content).

@KartikMistry, apparently this is what was bringing the confusion. Most of our new translations in WikiWorkshops consist of creating newly translated stub articles so that new users can learn how to utilize the tool. In these scenarios, only the first top paragraph gets translated and users are expected to publish their edit (thus creating a stub article). They can't do that because of the said restriction. This is making me think that the overall limit is fine on itself, we should just increase the paragraph one.

The paragraph-level limits only result in the publication being blocked when there are 50 paragraphs exceeding this limit (or 10 paragraphs for users with a translation deleted in the last month). So for the case of a short translation, the general limit will be the only one in effect.

We have plans to improve the way limits work (T251887), and hearing about specific cases with details in the context is very useful. Thanks for sharing the example of stub creations, and feel free to share more examples of the limits not working ass expected. Thanks!!

We have plans to improve the way limits work (T251887), and hearing about specific cases with details in the context is very useful. Thanks for sharing the example of stub creations, and feel free to share more examples of the limits not working ass expected. Thanks!!

If that is the case maybe this specific task should be closed then because I wasn't aware of the complexity of different limits that exist underneath the Content Translation Tool and thus I've failed in exactly pinpointing what would be needed to be changed for us to solve the problem we're experiencing. It clearly is not what the current title of the task asks for. We just wanted a way for our new users to be allowed to publish their translated stub articles which consist of just the lede most of the time, something they currently can't do because they haven't changed the text enough according to the tool, even though the text doesn't need to be changed further because the Google Translate had done a decent translation on its own. But I can't say what needs to be changed for that to happen so maybe I'll have to wait until limits are "improved" in general and hopefully they're easier to understand and fine-tune locally.

Nikerabbit claimed this task.
Nikerabbit moved this task from Needs Triage to MT on the ContentTranslation board.
Nikerabbit subscribed.

Closing per your latest comment. Feel free to reopen if you think we can make specific changes here.