Page MenuHomePhabricator

Make more strict the check for unmodified content for the whole document on Indonesian Wikipedia
Closed, ResolvedPublic

Description

Currently, an error preventing publishing is shown when the amount of unmodified content is 99% or higher (T190283). This was targeted to prevent the most clear cases of vandalism. Given that low-quality translations seem to proliferate in Indonesian Wikipedia (T219851) we want to adjust this threshold to be more strict.

On Indonesian Wikipedia (and only there), the threshold will be adjusted to prevent the publication of translations with an overall amount of >30% of unmodified contents.

Based on data and feedback we'll evaluate whether the adjusted threshold improves the situation significantly in conjunction with other adjustments proposed (T221359), or it needs further adjustment. We need to keep in mind the potential for false positives, since elements such as proper nouns, templates, short section titles, and references are often legitimate unmodified content that is ok for users to publish (so we may need to keep some 5-10% of margin of error).

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 18 2019, 11:17 AM
Pginer-WMF triaged this task as High priority.Apr 18 2019, 11:17 AM
Pginer-WMF added a subscriber: santhosh.
Pginer-WMF updated the task description. (Show Details)Apr 18 2019, 2:14 PM

The Indonesian community has decided to remove machine translation, this is the will of the community: https://id.wikipedia.org/w/index.php?title=Wikipedia:Warung_Kopi_(Teknis)&oldid=14962460#Mesin_penerjemah

This comment was removed by Mimihitam.
Pginer-WMF updated the task description. (Show Details)Apr 18 2019, 2:52 PM
Pginer-WMF updated the task description. (Show Details)Apr 18 2019, 3:29 PM

Change 505220 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[operations/mediawiki-config@master] Use higher unmodified MT threshold for Indonesian Wikipedia

https://gerrit.wikimedia.org/r/505220

Change 505220 merged by KartikMistry:
[operations/mediawiki-config@master] Use higher unmodified MT threshold for Indonesian Wikipedia

https://gerrit.wikimedia.org/r/505220

Mentioned in SAL (#wikimedia-operations) [2019-04-23T11:18:48Z] <kartik@deploy1001> Synchronized wmf-config: SWAT: [[gerrit:505220]] Use higher unmodified MT threshold for Indonesian Wikipedia (T221353) (duration: 00m 57s)

Pginer-WMF closed this task as Resolved.Apr 24 2019, 8:43 AM

This is working now. One related piece that was still pending is showing the additional details to better explain the situation (T203377), which can be a good follow-up.

Mimihitam added a comment.EditedApr 26 2019, 8:18 AM

This does not work at all. I only changed one or two words, and it still got published. Compare:

https://id.wikipedia.org/w/index.php?diff=15026001&oldid=15025985&title=Serangan_John_Brown_ke_Harpers_Ferry&type=revision

and

https://id.wikipedia.org/w/index.php?title=Kekuasaan_Venesia_di_Kepulauan_Ionia&type=revision&diff=15026058&oldid=15026014

As a note, a good translator will never translate "Brown's party of 22" into "Partai Brown 22", because in English, that actually means "Brown 22 political party"!! This clearly demonstrates how horrible machine translation is.

Since this option has failed, please disable machine translation entirely as was agreed by the community in the first place.

Thank you.