Page MenuHomePhabricator

Measure percentage of translations published with and without the expected level of modified content
Closed, DeclinedPublic


The new version of Content translation provides more control on how much users edit the initial translations. Translations published with one paragraph with too much unmodified content (80% or more of machine translation, 60% or more of content copied from the source article) are added to a tracking category (T190798).
We want to have a better understanding on the percentages of translations that are published with too much unmodified content (i.e., those added to the category) and those published with the expected level of modifications (i.e., not added to the category).

We want to be able to obtain a set of statements like the following ones:

(In Catalan Wikipedia,) most of the translations (80%) are published with the expected level of user modifications compared to those that are added to the tracking categories for more careful review by the community (20%).

We want to capture the results since the beginning of 2019, where most translations were done using version 2.

We want to capture the results for the following Wikipedias:

  • Global result for all Wikipedias
  • English
  • German
  • Indonesian
  • Arabic
  • Catalan
  • Czech
  • French
  • Hebrew
  • Italian
  • Korean
  • Portuguese
  • Russian
  • Spanish
  • Tamil
  • Ukrainian

Additional considerations:

  • The results and the queries used to obtain them will be published on wiki and linked from the CX analysics page.
  • The results can be expressed in a table like the one below:
Wiki% reviewed% unreviewed
French Wikipedia70%30%
Spanish Wikipedia80%20%
All Wikipedias70%30%

(For reference it would be useful to also include the totals for reviewed and unreviewed in addition to the percentages)

Related ticket: T209868: Extend CX2 translations graph to show also published translations that need review

Event Timeline

Restricted Application added subscribers: Base, revi, Aklapper. · View Herald TranscriptMar 11 2019, 11:46 AM
Pginer-WMF triaged this task as Medium priority.Mar 11 2019, 11:47 AM
Pginer-WMF closed this task as Declined.Mar 21 2019, 8:11 AM

The current measurements don't seem reliable. We'll skip this report for the intended communications, and focus on the more long-term solution defined in T209868