Page MenuHomePhabricator

Adjust publishing restrictions based on the number of paragraphs affected and previous translations by the user
Closed, ResolvedPublic

Description

We want to favour the creation of good translations and prevent “raw or lightly edited machine translation” to leak into Wikipedia. The current systems to warn, track and prevent publication of translations may require some adjustments to both avoid false positives and be more strict with those cases where the translations are more likely to be problematic.

In order to support this, this ticket proposes to consider two factors:

  • The number of problematic paragraphs. That is, the number of paragraphs for which the unmodified content exceeds the current thresholds. If there are a significant number of problematic paragraphs, we may want to prevent from publishing the translation.
  • Deletions of previous translations by the user. If any of the previous translations published by the user in the main namespace during the last month were deleted, we can apply more strict limits to make sure that content is properly reviewed.

Proposed approach

The proposed adjustment will be as follows:

For a regular user:

  • With 0 - 9 problematic section: Allow publish and do not add to the tracking category (to reduce false positives as described in T217653).
  • With 10 - 49 problematic sections: Allow publishing but add to the tracking category.
  • With 50 or more problematic sections: Prevent publishing.

For a user with previous deleted translations:

  • With 1 - 9 problematic sections: Allow publishing but add to the tracking category.
  • With 10 or more sections: Prevent publishing.

Design details

For the cases where publishing is prevented, we want to show an error message. The error message is the same used for the unmodified threshold for the whole document (T190283), but using the "Your translation contains significant portions of unmodified text":

In addition, we want users to easily see where is the content they need to fix. When the error is shown, paragraphs will make visible the warnings related to too much unmodified content (T190279) for the problematic paragraphs, even if they were "marked as resolved".

Details

Related Gerrit Patches:
mediawiki/extensions/ContentTranslation : masterUpdate MT limits
mediawiki/extensions/ContentTranslation : masterUse different message for different MT abuse cases
mediawiki/extensions/ContentTranslation : masterAdjust publishing restrictions

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 18 2019, 12:24 PM
Pginer-WMF triaged this task as High priority.Apr 18 2019, 12:25 PM
Pginer-WMF updated the task description. (Show Details)Apr 23 2019, 11:44 AM

Change 507894 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/extensions/ContentTranslation@master] Adjust publishing restrictions

https://gerrit.wikimedia.org/r/507894

Change 507894 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Adjust publishing restrictions

https://gerrit.wikimedia.org/r/507894

I checked that a translation with 10 unmodified paragraphs is prevented from publishing. However, the message shown is not the intended one according to the description above ("Your translation contains significant portions of unmodified text").

Steps to reproduce:

  1. Add 10 paragraphs to the translation without editing them further
  2. Add an additional paragraph and paste enough lorem ipsum text to make the percentage of unmodified MT for the whole document to go below the limits.
  3. Try to publish (in the user namespace, just in case).

Expected result:

  • Error preventing to publish showing "Your translation contains significant portions of unmodified text"

Current result:

  • Error preventing to publish showing the total percentage of unmodified content which is not the reason for the inability to publish and can be misleading. In the example below, 24% is shown as the total of MT, implying that this percentage exceeds the limits when that's not the case:

In summary: When users are prevented to publish because the total percentage of unmodified content for the whole document we should show such percentage. When users are prevented to publish because of the number of problematic paragraphs, we should just refer to "significant portions of unmodified text" instead.

Change 515038 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/extensions/ContentTranslation@master] Use different message for different MT abuse cases

https://gerrit.wikimedia.org/r/515038

Change 515038 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Use different message for different MT abuse cases

https://gerrit.wikimedia.org/r/515038

Pginer-WMF updated the task description. (Show Details)EditedJun 14 2019, 8:37 AM

The current limits seem to be too strict based on recent feedback (1) (2), (3), (4), (5), (6), (7), (8), (9). We may need to adjust the limits for the regular user to make them less strict. In this way, most users will have room to accommodate the cases where MT works well, while we make the bar higher after one of their translations gets deleted in the last month.

The current behaviour that was proposed initially is described below:

For a regular user:

  • With 1 problematic section: Allow publish and do not add to the tracking category (to reduce false positives as described in T217653).
  • With 2-9 problematic sections: Allow publishing but add to the tracking category.
  • With 10 or more problematic sections: Prevent publishing.

For a user with previous deleted translations:

  • With 1 - 4 problematic sections: Allow publishing but add to the tracking category.
  • With 5 or more sections. Prevent publishing.

The new proposal is:

For a regular user:

  • With 0-9 problematic sections: Allow publish and do not add to the tracking category (to reduce false positives as described in T217653).
  • With 10-49 problematic sections: Allow publishing but add to the tracking category.
  • With 50 or more problematic sections: Prevent publishing.

For a user with previous deleted translations:

  • With 1 - 9 problematic sections: Allow publishing but add to the tracking category.
  • With 10 or more sections. Prevent publishing.

Change 517268 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/extensions/ContentTranslation@master] Update MT limits

https://gerrit.wikimedia.org/r/517268

Change 517268 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Update MT limits

https://gerrit.wikimedia.org/r/517268

Jpita added a subscriber: Jpita.

checked in production

Petar.petkovic closed this task as Resolved.Jun 25 2019, 10:18 PM

I guess this can be resolved.