Page MenuHomePhabricator

Limit or inhibit access to machine translation for users in Chinese Wikipedia
Open, In Progress, MediumPublic

Description

A concensus has been reached regarding the limitation or prohibition of machine translation for users in Chinese Wikipedia (zhwiki). The discussion is available here.

A brief overview of the concensus
Background. Recent years have witnessed an upsurge of badly-prosed articles that defy Chinese language conventions, including but not limited to improper use of punctuations and oddly-long clauses that are unorthodox in Chinese language. We have noticed that the existence of these articles (eligible for speedy deletion) are related with machine translation and content translation. Extreme cases of malicious use of content translation include some machine-translated personal attack pages. This has become a burden for editors, patrollers, and readers.
Request. Our request is to limit or inhibit access to machine translation (MT). No request is made regarding to content translation (CX). Specifically, our request is to-

  • Preferably limit access to machine translation to extended confirmed users and sysops. Only users in these groups can utilize machine translation as a starting point when using content translation.
  • If the above request is not technically viable, we also accept disabling machine translation in its entirety. That is, nobody can access machine translation when using content translation.

Related prior acts. English Wikipedia and Japanese Wikipedia T323973: Disable machine translation for Japanese have already disabled machine translation for everyone. English Wikipedia even has gone so far as to limit the access to content translation tool to extended confirmed users and sysops. While we have not determined to put restrictions on content translation tool, we believe that our situation is close to Japanese Wikipedia, and request a restriction on machine translation.

I hereby offer my appreciation in advance.

Event Timeline

Please note that the threshold of machine translation publication was adjusted to 70% in 2020 T246383 , however, it was reverted due to the problem of unexpected false positives T252371 .

In my opinion, it is impossible to adjust the threshold, and the only way to solve the problem of pages with poor translation quality created by machine translation is to disable machine translation feature. Thanks.

As a reference, in the last 3 months on Chinese Wikipedia: 27% of the translations were published by users with an edit count below 100, and 73% by users with an edit count of 100 or more.

Currently, we could limit the access for publishing into the main namespace. However, machine translation cannot be restricted to specific groups.

Disabling machine translation to all users will impact negatively those making a good use of the tool. Even if the limit system is not perfect for Chinese, it may be preferred to increase the translation limits. That is, having machine translation with the requirement to modify it heavily (even if it requires rewriting some parts that were already correct, rather than having always to start from scratch.

An immediate measure we can take in the direction of reducing access to machine translation could be to: make machine translation as optional, and increase the translation limits.

@Pginer-WMF As we are further discussing your reply, I have one question

An immediate measure we can take in the direction of reducing access to machine translation could be to: make machine translation as optional, and increase the translation limits.

Could you elaborate on "optional"? In my knowledge, the status quo is that I can already select from a drop-down list of machine translation providers, plus raw text and total blank. What will it be like if machine translation is made "optional"?

@Pginer-WMF As we are further discussing your reply, I have one question

An immediate measure we can take in the direction of reducing access to machine translation could be to: make machine translation as optional, and increase the translation limits.

Could you elaborate on "optional"? In my knowledge, the status quo is that I can already select from a drop-down list of machine translation providers, plus raw text and total blank. What will it be like if machine translation is made "optional"?

In that scenario, copying the source text would be the default (as if there was no machine translation available), but users interested in using machine translation still have it available. Users interested in using machine translation, will have to explicitly select a machine translation provider (Google, MinT, etc.) from the drop-down.

This can be helpful in combination with making the limits more strict:

  • We communicate that machine translation is not the expected way, so getting stricter limits becomes less frustrating. Users selected the option and can go back to the default.
  • Also a good percentage for users tend to follow the defaults. Here the expectation would be that those may be the ones willing to spend less time or paying less attention, which are the ones that may be making less effort to improve the machine translation if it was provided to them.

Ideally, it would be great to determine whether the user is making good use of machine translation in a way that we can allow all good uses and prevent all bad ones. The above proposes a step in the direction of limiting machine translation in a way that is expected to have less negative side effects than turning it off completelly.

As SCP-2000 mentioned above, a previous attempt to raise the publishing threshold ended up in big technical issue. SCP-2000 maintains his stance in today's discussion. He further provides a discussion 3 years ago, where a user was blocked from publishing his article because his article had 1% of unedited text. You read that right.

考虑到相关限制之算法未有任何改进,个人认为现时再出现类近误判的几率并不低。谢谢。
Considering that there have been no substantial improvements on the related algorithm, I believe the chance of that happening again is not neglectable. Thank you.

So said @SCP-2000.

Hello, thanks for @MilkyDefer forward and @Pginer-WMF response. Let me clarify my opinion here.

He further provides a discussion 3 years ago, where a user was blocked from publishing his article because his article had 1% of unedited text.

The user was blocked because his draft contained 93% unmodified machine translation text (See also his given screenshot F41544945). Please note that the translation quality of this draft was good enough to publish.

Currently, the threshold on zhwiki is 95%. If we modify to 90% or even lower, similar false positives cases will happen again.

The final consensus from the local community is unknown to me. But In my opinion, increasing the translation limits is a problematic measure. Thanks.

@Pginer-WMF

After discussion I believe a compromise has been reached. We agree to try out:

  • Prohibiting publishing in article namespace; and
  • Make MT non-default.

One participant argues that making MT non-default is equivalent to endorsing the use of MT, we haven't tested his claim. Therefore, this task should not be closed and further evaluation is needed to determine whether we should take more aggressive actions (i.e. disable it completely).

@Pginer-WMF

After discussion I believe a compromise has been reached. We agree to try out:

  • Prohibiting publishing in article namespace; and
  • Make MT non-default.

One participant argues that making MT non-default is equivalent to endorsing the use of MT, we haven't tested his claim. Therefore, this task should not be closed and further evaluation is needed to determine whether we should take more aggressive actions (i.e. disable it completely).

@Pginer-WMF Hello, any further updates of this matter? Thanks.

Hello, is there any update since Dec 26?

@Pginer-WMF

After discussion I believe a compromise has been reached. We agree to try out:

  • Prohibiting publishing in article namespace; and
  • Make MT non-default.

Perfect. I think we can plan to introduce these changes. We plan to introduce these in iterations.

  1. Limit publishing into the main namespace to "extended confirmed users" only.
  2. Get input from the community on the effects, for the community to decide whether to make the restriction more/less strict.
  3. Make machine translation non-default.

In this way we can have a better understanding on the effect of each of the changes and how to adjust them.

@Pginer-WMF

After discussion I believe a compromise has been reached. We agree to try out:

  • Prohibiting publishing in article namespace; and
  • Make MT non-default.

Perfect. I think we can plan to introduce these changes. We plan to introduce these in iterations.

  1. Limit publishing into the main namespace to "extended confirmed users" only.
  2. Get input from the community on the effects, for the community to decide whether to make the restriction more/less strict.
  3. Make machine translation non-default.

In this way we can have a better understanding on the effect of each of the changes and how to adjust them.

Hello, thanks for your response. The community agree these changes per this discussion.

Change #1018135 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/mediawiki-config@master] ContentTranslation: Limit publishing in zhwiki for extendedconfirmed users only

https://gerrit.wikimedia.org/r/1018135

Change #1018135 merged by jenkins-bot:

[operations/mediawiki-config@master] ContentTranslation: Limit publishing in zhwiki for extendedconfirmed users only

https://gerrit.wikimedia.org/r/1018135

Mentioned in SAL (#wikimedia-operations) [2024-04-09T07:04:12Z] <kartik@deploy1002> Started scap: Backport for [[gerrit:1018135|ContentTranslation: Limit publishing in zhwiki for extendedconfirmed users only (T349959)]]

Mentioned in SAL (#wikimedia-operations) [2024-04-09T07:09:14Z] <kartik@deploy1002> kartik: Backport for [[gerrit:1018135|ContentTranslation: Limit publishing in zhwiki for extendedconfirmed users only (T349959)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-04-09T07:29:42Z] <kartik@deploy1002> Finished scap: Backport for [[gerrit:1018135|ContentTranslation: Limit publishing in zhwiki for extendedconfirmed users only (T349959)]] (duration: 25m 30s)

@SCP-2000 @Pginer-WMF As a first step, we have limited publishing to the main namespace for 'extendedconfirmed' user group. Feel free to test and let us know.

The next step is to set Machine Translation non-default (default to source content) for zh language.

Change #1018253 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/cxserver@master] config: Set source as a default for zh

https://gerrit.wikimedia.org/r/1018253

Change #1018253 abandoned by KartikMistry:

[mediawiki/services/cxserver@master] config: Set source as a default for zh

Reason:

To be done later if needed.

https://gerrit.wikimedia.org/r/1018253

@SCP-2000 @Pginer-WMF As a first step, we have limited publishing to the main namespace for 'extendedconfirmed' user group. Feel free to test and let us know.

The next step is to set Machine Translation non-default (default to source content) for zh language.

Hello, thanks for your help! However, non-extendedconfirmed users still can publish translations to the main namespace, for example:

It seems that this issues has been reported before T330363#8705685. Could you please take a look at it?

According to simple research, wgContentTranslationPublishRequirements is only used on the Web side, and no interception is performed on the PHP side.
mw.cx.init.Translation.prototype.checkIfUserCanPublish to verify whether it is published in the main namespace is only used by the onNamespaceChange hook (mw.cx.init.Translation.prototype.onNamespaceChange) and the hook does not seem to be triggered during initialization.
Edit: this.veTarget.getPublishNamespace() in mw.cx.init.Translation.prototype.checkIfUserCanPublish is null in first call beacuse of unknown problem.

Edit: this.veTarget.getPublishNamespace() in mw.cx.init.Translation.prototype.checkIfUserCanPublish is null in first call beacuse of unknown problem.

Thanks. We are looking into this.

Change #1020221 had a related patch set uploaded (by Nik Gkountas; author: Nik Gkountas):

[mediawiki/extensions/ContentTranslation@master] CX: Initialize publishNamespace for CXTarget

https://gerrit.wikimedia.org/r/1020221

Test wiki created on Patch demo by SunAfterRain using patch(es) linked to this task:
https://patchdemo.wmflabs.org/wikis/276744a753/w

Test wiki on Patch demo by SunAfterRain using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/276744a753/w/

Sorry, I accidentally forgot to turn off the notification option, please ignore the demo above.

Sorry, I accidentally forgot to turn off the notification option, please ignore the demo above.

No issue. ContentTranslation in patchdemo won't work because it can't access cxserver from there.

Change #1020221 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] CX: Initialize publishNamespace for CXTarget

https://gerrit.wikimedia.org/r/1020221

Change #1023148 had a related patch set uploaded (by KartikMistry; author: Nik Gkountas):

[mediawiki/extensions/ContentTranslation@wmf/1.43.0-wmf.1] CX: Initialize publishNamespace for CXTarget

https://gerrit.wikimedia.org/r/1023148

Change #1023148 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@wmf/1.43.0-wmf.1] CX: Initialize publishNamespace for CXTarget

https://gerrit.wikimedia.org/r/1023148

Mentioned in SAL (#wikimedia-operations) [2024-04-23T07:27:39Z] <kartik@deploy1002> Started scap: Backport for [[gerrit:1023148|CX: Initialize publishNamespace for CXTarget (T349959)]]

Mentioned in SAL (#wikimedia-operations) [2024-04-23T07:42:09Z] <kartik@deploy1002> kartik: Backport for [[gerrit:1023148|CX: Initialize publishNamespace for CXTarget (T349959)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-04-23T08:12:32Z] <kartik@deploy1002> Finished scap: Backport for [[gerrit:1023148|CX: Initialize publishNamespace for CXTarget (T349959)]] (duration: 44m 53s)

@SCP-2000 @SunAfterRain We've deployed a potential fix, but seems not solving the problem. I'm looking into the details.

@ngkountas It seems our patch isn't solving the issue. Can you take a look?