Page MenuHomePhabricator

Disable ContentTranslation for non-extended confirmed users on viwiki
Closed, ResolvedPublic

Description

Per community consensus, please disable machine translation for all users that don't have extendedconfirmed. We have been using a simple script (and an abuse filter) for enacting, but it seems to be outdated after new Vector's deployment.

Proposed approach

Configure Content Translation to prevent publishing users that are not extendedconfirmed when translating into Vietnamese. Users affected will see a message indicating the limitation as described in T192144 before they attempt to publish (avoiding waste of time of working on the translation and finding out later).

Event Timeline

Content translation has a quality control mechanism that can be configured for each wiki to make the limits more or less strict.
If there are issues with the content created with the tool we encourage to adjust the limits so that publishing is not allowed if more than X% of the initial translation has not been modified. We need some guidance (and possible some iterations) to find the right value for X, but we believe it is better and more fair to evaluate the translations by the quality of the content rather than the number of edits of the user making them.

After an initial request from Vietnamese editors (T275121) we adjusted the limits to prevent the publication of translations with 90% or more of unmodified machine translation. As a reference, the limit was adjusted to 70% for Indonesian. So we may want to try if a similar limit could work better for Vietnamese.

If there are issue son how the limits are computed on a given language we want also to know about them to improve the way limits work (T251887).

Please, let us know if increasing the limits seems a good approach to try before taking more drastic measures that could affect users actually doing a good translation.

@Pginer-WMF, I understand that checking the content is more fair, but we simply do not have the man power to do this every day for every single article that comes out of CT. Bad translation has been a very severe problem on Wikipedia Vi for many years. The community has decided to take the drastic measure to protect the quality of Wikipedia Vi. Currently, we're having a second discussion to measure the consensus again.

Currently we are having a debate about re-deciding. I will conclude their comments and reply in a week. Thanks for understanding.

@Pginer-WMF, I understand that checking the content is more fair, but we simply do not have the man power to do this every day for every single article that comes out of CT. Bad translation has been a very severe problem on Wikipedia Vi for many years. The community has decided to take the drastic measure to protect the quality of Wikipedia Vi. Currently, we're having a second discussion to measure the consensus again.

Just to clarify. What I proposed is to set automatic rules based on the content (how much of the initial machine translation remains unedited). That is, making the limits more strict would prevent from publishing translations that have not been edited enough. In this way, a professional translator making good translations of Wikipedia articles would be allowed to publish even if their total number of edits is not super high.

@Pginer-WMF the problem is that many people have found a way to cheat (copy from CT and publish outside). This has been happening for years. It would be a little inconvenience for a professional translator to write an article without CT, yes. However, it's a good trade off. Lastly, 500 edits are not too high. In any case, we'll let you know what the consensus will be.

NguoiDungKhongDinhDanh renamed this task from Disable machine translation for non-extended confirmed users on viwiki to Disable ContentTranslation for non-extended confirmed users on viwiki.Jan 29 2022, 9:22 AM

Per community consensus, please disable ContentTranslation special page for non-extended confirmed user.

@Pginer-WMF the problem is that many people have found a way to cheat (copy from CT and publish outside). This has been happening for years. It would be a little inconvenience for a professional translator to write an article without CT, yes. However, it's a good trade off. Lastly, 500 edits are not too high. In any case, we'll let you know what the consensus will be.

Copying outside of the tool makes things a bit more complicated. In order to understand the context, can anyone share some more details about how do you know that these contents originate from Content Translation tool instead of other tools or scripts that can use machine translation?

@Pginer-WMF I'm not sure why we are having this discussion when the consensus is very clear.

We know because we do some testing. They look identical to articles that would be published by CT. Visual editor is another indication. Third, simply using Google translate wouldn't work (all the refs would be messed up). Fourth, almost all people are not technical enough to use other tools or scripts to cheat.

Please respect the consensus.

@Pginer-WMF I'm not sure why we are having this discussion when the consensus is very clear.

We know because we do some testing. They look identical to articles that would be published by CT. Visual editor is another indication. Third, simply using Google translate wouldn't work (all the refs would be messed up). Fourth, almost all people are not technical enough to use other tools or scripts to cheat.

Please respect the consensus.

Thanks for the additional context, @Nguyentrongphu. I think it is important to consider how a given change contributes or not to solve the issue.
In this particular case it can be useful to measure how much content is copied from Content Translation into Visual Editor. This would allow to understand how much the problem is reduced when limiting the access to the tool to certain users, for example.

It is possible that this issue may also be happening in other of the 300+ Wikipedia communities, and learning more about the issue can help to make the tool better for everyone. Thanks again for your feedback.

Issues that affect content quality may affect other

@Pginer-WMF I think each Wiki should come up with their own solution depending on how severe the CT problem is. The problem has been running rampant for years in Vi Wikipedia (numbering dozens of thousands of articles). It only became manageable when we decided to restrict CT to extended confirmed users by using abuse filter. The solution has been proven to be effective. The left-over abusers can be dealt with by blocking.

@Pginer-WMF I think each Wiki should come up with their own solution depending on how severe the CT problem is. The problem has been running rampant for years in Vi Wikipedia (numbering dozens of thousands of articles). It only became manageable when we decided to restrict CT to extended confirmed users by using abuse filter. The solution has been proven to be effective. The left-over abusers can be dealt with by blocking.

Thanks for the input @Nguyentrongphu. I was not questioning the effectiveness of the solution. My concern is that in addition to solve the problem may come with additional undesirable side-effects (like preventing the participation of some users that could do a good translation). I tried to surface which is the tradeoff for this approach, but if the community is ok with that there is no problem to proceed. I have added some more details to make the ticket ready to be worked on.

@Pginer-WMF Unfortunately, it's the only possible effective solution considering our man power. A new user can gain a little more experience before being able to use CT, no big deal. Btw, we speak Vietnamese, not Thai.

Pginer-WMF raised the priority of this task from Medium to High.Mar 14 2022, 11:20 AM

Change 770882 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/mediawiki-config@master] Disable ContentTranslation for non-extended confirmed users on viwiki

https://gerrit.wikimedia.org/r/770882

Change 770882 merged by jenkins-bot:

[operations/mediawiki-config@master] Disable ContentTranslation for non-extended confirmed users on viwiki

https://gerrit.wikimedia.org/r/770882

Mentioned in SAL (#wikimedia-operations) [2022-03-16T07:08:56Z] <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: 455895168ab266813ae499e8fc353c66e6d5b450: Disable ContentTranslation for non-extended confirmed users on viwiki (T299636) (duration: 00m 51s)

Change 773985 had a related patch set uploaded (by NguoiDungKhongDinhDanh; author: NguoiDungKhongDinhDanh):

[operations@refs/meta/config] Fix Id1fa4d6b02155c940c2b40b1c5411d5479dc7d2b: Add viwiki eliminators to wgContentTranslationPublishRequirements.

https://gerrit.wikimedia.org/r/773985

Reopening as eliminators, who are already extended confirmed, cannot publish their translations. Reported by NhacNy2412.

Change 773985 abandoned by NguoiDungKhongDinhDanh:

[operations@refs/meta/config] Fix Id1fa4d6b02155c940c2b40b1c5411d5479dc7d2b: Add viwiki eliminators to wgContentTranslationPublishRequirements.

Reason:

Wrong repo

https://gerrit.wikimedia.org/r/773985

Change 774386 had a related patch set uploaded (by NguoiDungKhongDinhDanh; author: NguoiDungKhongDinhDanh):

[operations/mediawiki-config@master] Fix Id1fa4d6b02155c940c2b40b1c5411d5479dc7d2b: Add viwiki eliminators to wgContentTranslationPublishRequirements.

https://gerrit.wikimedia.org/r/774386

Also, non-extended-confirmed users are reported to be able to publish their translation into their userspace. Can that be prevented by hard code, or will we have to use abuse filters instead?

Change 774386 merged by jenkins-bot:

[operations/mediawiki-config@master] Add viwiki eliminators to wgContentTranslationPublishRequirements

https://gerrit.wikimedia.org/r/774386

Mentioned in SAL (#wikimedia-operations) [2022-03-29T07:23:52Z] <kartik@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:774386|Add viwiki eliminators to wgContentTranslationPublishRequirements (T299636)]] (duration: 00m 50s)

Also, non-extended-confirmed users are reported to be able to publish their translation into their userspace. Can that be prevented by hard code, or will we have to use abuse filters instead?

This solution was developed in response to the restrictions added by English Wikipedia, in order to let users know they won't be able to publish before they make the effort in their translation and find out only at the publishing time. This has been in place for a while on a large wiki such as English Wikipedia. I'd recommend to monitor the activity for a while and consider using the regular wiki tools to apply further limits if needed.

Note that publishing into the user namespace is subject to the same limits, and it leaves a trace that can be targeted by edit filters if needed.

@Pginer-WMF The solution is redundant since there is already a big message (as soon as they get into CT) telling those users that they can't publish with CT.

I looked to the article creation data for the last month (April 2022):

  • 7287 pages were created in Vietnamese Wikipedia during April, only 58 were created with Content Translaiton (0.8%).
  • From all the 58 articles created with Content Translation:
    • Only 4 were published in the user namespace (7%).
    • Only one was deleted (1.7%)

Given the above numbers, it does not seem that the option to propose publishing in the user namespace is being used as a workaround to avoid the limit. So it may not be needed to adjust this mechanism further for now.
I'm resolving this for now. If there are more issues in the future we are happy to look into this and try to adjust the tool to better respond to those.

@Pginer-WMF For your information, the abuse filter mentioned in the description (111) and another one that prevents publishing CT drafts into user space (114) were enabled during that time.

@Pginer-WMF For your information, the abuse filter mentioned in the description (111) and another one that prevents publishing CT drafts into user space (114) were enabled during that time.

Thanks for sharing. I just looked into the data for those and there were 9 instances of the filter #111 preventing a translation and no instances for #114 during April.