In Content Translation, a system of limits encourage users to review the initial translations.
The current system makes a decision (prevent publishing, warn or add to a tracking category) based on several factors including the total percentage of unmodified contents in the translation, the number of problematic paragraphs, whether those were marked as reviewed and whether the user had translations deleted in the previous month.
Making the decisions based on the number of paragraphs seems to introduce some problems. For example, longer articles are more likely to include content that can generate false positives such as math formulas (T245827). However, short articles can have a higher percentage of unedited machine translation.
This ticket proposes to simplify the system of limits so that the decision is made based on the overall percentage of unmodified content, with adjustments to make such global limit more or less strict.
Proposed approach
The limits are based on two parameters that can be adjusted for each community (ideally exposed through community configuration for them to adjust):
- Limit (L): 95% by default, indicates how much unedited machine translation is allowed to be published.
- Flexibility (F): 10% by default, indicates the percentage points that the limit can be adjusted to make it more strict/relaxed when needed.
The limit system rules are based on the overall percentage of unedited machine translation (MT):
- If MT > L: Publishing is prevented. By default, this means users can publish translations with less than 95% of unedited machine translation.
- If MT> L - F: A warning is shown and the published translation is tagged for the community to review. By default, this means users trying to publish a translation with 85% to 95% of unedited machine translation will be able to publish their translation but will get a warning to review it.
- If the user has a translation deleted in the last 30 days, publishing is prevented. With the default values, this means that a user with a previously deleted translation will only be able to publish translations with less than 85% of MT during a month (after it, the usual 95% limit will apply). Given that historically problematic events such as contests have a duration that does not tend to exceed a month, making the limit strict
For paragraphs
The limits are calculated for the whole translation. Information at a paragraph level is used for user guidance and incorporating their feedback.
- For each paragraph warnings are shown when the unedited machine translation is considered high as a guidance for the user about which paragraphs they may need to edit to improve the overall percentage for the article.
- Paragraph warnings include an option to indicate the user already reviewed the translation, which allow to compensate for cases where the machine translation is very good. In such cases an additional percentage will be considered as modified.
So, for each paragraph:
- If MT> L - F: A warning is shown. A warning is shown to indicate that the paragraph contains too much unedited machine translation. By default, when the paragraph contains over 85% of unedited machine translation.
- If the user marks the warning as resolved, the paragraph MT will be computed as (MT - ½F). This only applies to users without a deleted translation during the last 30 days. For example, a user marking a paragraph with 90% of unedited machine translation as resolved, the paragraph will be considered as having 85% of unedited machine translation instead (90% - ½*10%).
This system is expected to be easier to communicate (we can show which is the percentage to reach to be able to publish, and the paragraphs that can be edited), account for false positives (letting users confirm when translation is good) and preventing abuse (catching one problematic translation will require additional editing for the user during a limited period of time). Combined with limiting the pace of article creations (T331023), this approach can also help with low quality spikes that have resulted in some cases due to contests and events.