Page MenuHomePhabricator

Disable machine translation for Japanese
Closed, ResolvedPublic

Description

We received feedback from Japanese editors about low quality translations being created for Japanese Wikipedia, which was described as resulting in «About half articles from CX2 have been reviewed as "Bad quality translation"». We analyzed the data further finding the deletion deletion rates for translations in the current (9.5%) or past year (6%) for Japanese Wikipedia were not matching the community perception, and it was proposed to make the translation limits stricter instead (T321819) which was not an approach the editors wanted to try and the adjustment was proposed to be reverted (T323721).

In the light of the editors considering more drastic measures ("If you don't ban machine translation from CX2 for translations to Japanese, we will (try to) ban CX2 itself") that can impact more translators, this ticket proposes to disable machine translation when translating into Japanese.

Given that most of the recent translations to Japanese use machine translation (over 90% based on the sample from this report), and considering that machine translation support was requested in the past for Japanese. In the event that other community members request machine translation support, we may have to find ways to reach a consensus that is satisfying for the whole community.

Event Timeline

Pginer-WMF triaged this task as Medium priority.Nov 29 2022, 9:05 AM
Pginer-WMF updated the task description. (Show Details)

This might be a tangent but

considering that machine translation support was requested in the past for Japanese

It looks like the user who requested it contributes almost exclusively to English Wikipedia. I seems likely that they wanted Japanese-to-English machine translator, not English-to-Japanese.

I take that this task is about disabling machine translation into Japanese, but not the other way around, is that right?

I take that this task is about disabling machine translation into Japanese, but not the other way around, is that right?

That's correct. Thanks for the context, @whym.

Change 861968 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/mediawiki-config@master] ContentTranslation: Disable machine translation for Japanese WP

https://gerrit.wikimedia.org/r/861968

Change 861968 merged by jenkins-bot:

[operations/mediawiki-config@master] ContentTranslation: Disable machine translation for Japanese WP

https://gerrit.wikimedia.org/r/861968

Mentioned in SAL (#wikimedia-operations) [2022-11-30T08:03:18Z] <kartik@deploy1002> Started scap: Backport for [[gerrit:861968|ContentTranslation: Disable machine translation for Japanese WP (T323973)]]

Mentioned in SAL (#wikimedia-operations) [2022-11-30T08:04:29Z] <kartik@deploy1002> kartik and kartik: Backport for [[gerrit:861968|ContentTranslation: Disable machine translation for Japanese WP (T323973)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2022-11-30T08:11:50Z] <kartik@deploy1002> Finished scap: Backport for [[gerrit:861968|ContentTranslation: Disable machine translation for Japanese WP (T323973)]] (duration: 08m 31s)

Confirmed that MT is not available in ja.wiki

image.png (374×1 px, 105 KB)

Can we make it such that the option for machine translation is a privilege that can be applied for and given to translators into Japanese that produce high quality content? This would be given by admins on JA WP. Might address the concerns by the community.

resulting in «About half articles from CX2 have been reviewed as "Bad quality translation"». We analyzed the data further finding the deletion deletion rates for translations in the current (9.5%) or past year (6%) for Japanese Wikipedia were not matching the community perception

This seems like a flawed analysis. Some of the poor translation was probably proofread by other editors and the articles were likely kept, but that meant that valuable volunteer time was spent doing that, few people enjoy tidying up very poor work done by others. (Just to be clear this is my opinion basing on what happens to machine translations in ukwiki, where it was only recently that we increased the rate of immediate deletions of machine translated articles, I am not familiar with how it really goes in jawiki, just saying that using deletion alone is a poor indicator)

resulting in «About half articles from CX2 have been reviewed as "Bad quality translation"». We analyzed the data further finding the deletion deletion rates for translations in the current (9.5%) or past year (6%) for Japanese Wikipedia were not matching the community perception

This seems like a flawed analysis. Some of the poor translation was probably proofread by other editors and the articles were likely kept, but that meant that valuable volunteer time was spent doing that, few people enjoy tidying up very poor work done by others. (Just to be clear this is my opinion basing on what happens to machine translations in ukwiki, where it was only recently that we increased the rate of immediate deletions of machine translated articles, I am not familiar with how it really goes in jawiki, just saying that using deletion alone is a poor indicator)

Just to clarify. What I meant is that the data was not telling the whole story. If the deletion rates were really high for translations (e.g. 40%) that would be a clear indicator of very low quality translations. With deletion rates below 10% there are more nuances to consider such as the ones you mentioned, and it would be great to hear these kind of considerations from those in the Japanese community (e.g., having a better sense of how the the reviewed of about half articles from CX2 as "Bad quality translation" was conducted).

A problem with existing machine translation for Japanese like Google/Bing/DeepL is that they would halucinate and omit context from original text when performing translation. And this from my personal experience occur quite a bit more frequent than even other langauges that machine translation are deemed to have similarly worse performance like Chinese. And it is also much more often for those machine translation engines to try top do summary instead of actually performing translation when doing Japanese compared to other languages. My personal experience is that these NMT based translation engine are even more likely to cause these problems than requesting translation from Generative AI, and I am not saying Generative AI are performing great either.

I think that also explain why it is not appropriate to measure performance of machine translation in JA wikipedia through deletion rate, and is also reason behind translation modification rate not being helpful in solving the issue, as the largest problem is not exactly the language output by machine translation being bad, but is that the translated article unable to reflect correctly the content of the original article. Therefore the problem cannot be mitigated by requiring editors to edit the language to be more natural, and the problem cannot be assessed by how many articles being deleted for the translation being overally bad.