What team/program is this request for?
Language Team.
What are you requesting?
We want to better understand which are common traits present in low-quality translations where machine translation is used.
We want to analyze common factors that have been associated with low quality tranlations:
- Translations over a short period of time. This is commonly associated with campaigns/contests where some users may be incentivized to create a large number of articles without enough emphasis on quality.
- User expertise level. Communities have requested to limit access to Content Translation or Machine Translation to be accessed only by experienced users. This comes with the assumption that problematic translations are mainly produced by the less experienced users. An assumption we want to check and put in perspective.
- Length of the content. How long is the translation (in itself or with respect to the original article) is another factor that may signal low quality translations.
For measuring translation quality, we have used the article deletions as a proxy but additional signals can be considered too.
What is the problem you're trying to solve?
Understanding better when machine translation is misused for content creation helps us to adjust the prevention mechanisms to encourage good use of it.
What decision will you make or action will you take with the deliverable?
We plan to improve the translation limits system (T251887) and this analysis can be useful to (a) identify how to adjust the limits, and (b) set a baseline to identify improvements produced by the new limits.
In addition, as MinT is exposed to Wikipedia readers, options are provided to them to enter the editing path (contribute improved translation). This means that the translation activity will be exposed to a broader less experienced audience which may require additional guidance. Knowing the factors that affect translation quality will be useful to define the best approach to guide/encourage/discourage newcomers to translate in a certain context.