Page MenuHomePhabricator

Adjust the threshold for Telugu to prevent publishing when overall unmodified content is higher than 70%
Closed, ResolvedPublic

Description

As part of the conversations to enable Content Translation out-of-beta in Telugu Wikipedia (T243271), we receive feedback about the need to adjust the Content translation limits for unmodified translation:

The analysis tool shows that almost 50% of the translation content is manual edits in these articles. I will put this finding to community discussion and get back to you with suggested limits after taking the community opinion. It might take a week.
...
with regard to the manual translation content can you please set the minimum manual translation requirement to 40%

Modifying at least the 40% of the initial translation, implies to prevent publishing when overall unmodified content is higher than 60%. We need to keep in mind the potential for false positives, since elements such as proper nouns, templates, short section titles, and references are often legitimate unmodified content that is ok for users to publish (so we may need to keep a 5-10% of margin of error).

With all the above, the initial step proposed is to adjust the thresholds to prevent publishing when the overall unmodified content is higher than 70% for Telugu. After the adjustment, we'll observe the effects on the content created and the feedback we receive to do further adjustments if needed.

Deletion ratios

As a reference, we may want to keep in mind the deletion ratios for Telugu, and how those compare to the overall ones:

Looking at the analytics from last year (2019), the overall deletion ratio across all languages was 5% for articles created with Content Translation, and 11% for articles created without using the tool. That is, articles created with Content translation were two times more likely to survive the community review process.

However, for Telugu the deletion ratio was 22% for articles created with Content translation, and 15% for articles created without the tool. So articles created with the tool were slightly more likely to be deleted in this particular wiki.

Event Timeline

Pginer-WMF renamed this task from Adjust the threshold for Telugu to prevent publishing when overall unmodified content is higher than 75% to Adjust the threshold for Telugu to prevent publishing when overall unmodified content is higher than 70%.Feb 10 2020, 6:03 PM
Pginer-WMF triaged this task as Medium priority.
Pginer-WMF updated the task description. (Show Details)

Change 574265 had a related patch set uploaded (by KartikMistry; owner: KartikMistry):
[operations/mediawiki-config@master] Adjust MT threshold for Telugu to 70%

https://gerrit.wikimedia.org/r/574265

Change 574265 merged by jenkins-bot:
[operations/mediawiki-config@master] Adjust MT threshold for Telugu to 70%

https://gerrit.wikimedia.org/r/574265

Mentioned in SAL (#wikimedia-operations) [2020-02-24T12:12:22Z] <kartik@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit|574265|CX: Adjust MT threshold for Telugu WP to 70% (T244769)]] (duration: 00m 56s)