Page MenuHomePhabricator

CX2: Prevent publishing translations with too much unmodified content
Closed, ResolvedPublic

Description

In order to encourage users to review their translations, a warning will be shown when those exceed a certain limit (T190279). However, in some cases the amount of unmodified content may be so high that we want to prevent editors from publishing.

If the user exceeds a threshold (configurable per wiki, but with a default value of 99%), an error will be shown to the user when they try to publish. The threshold will be calculated for the whole document, and it will prevent publishing until the content is modified.

After trying to publishWhen checking the issue card

Trigger
The error will be triggered when the user tries to publish and the translation has a total percentage of unmodified content higher than the threshold. We may want to apply the threshold only when there is enough content to avoid confusions with false positives (e.g., publishing a one sentence article with a few images). We may need to track the progress of the user translation (T162113) to support this.

The publish button will become disabled until the user modifies the content enough.

Messages
After publishing an issue summary is shown:

Your translation cannot be published because it contains too much unmodified text.

The "View issues" option allows to expand the issue card showing the error:

Your translation contains <percentage>% of unmodified text
Automatic translation is provided only as a starting point. Make sure that the content is accurate and reads naturally in your language.
Your translation cannot be published without further editing.
[Learn more]

Actions

Modifying the content enough to no longer meet the threshold, will reenable the publish button, and remove the representations of the issue (the issue summary and in the issue card).


Showing the message on the issue card was not completed.
Follow-up ticket: T203377: CX2: Additional details for too much unmodified content error

Related Objects

Event Timeline

Pginer-WMF triaged this task as Normal priority.Mar 21 2018, 1:59 PM
Pginer-WMF created this task.
Pginer-WMF updated the task description. (Show Details)Mar 27 2018, 11:21 AM
Johan awarded a token.Apr 5 2018, 2:42 PM
jeblad added a subscriber: jeblad.Apr 8 2018, 1:05 PM

Note that some language pairs often creates flawless translations, like Bokmål-Nynorsk by the ml engine Apertium. For those language pairs there should not be a limit on publishing unmodified text.

See also T190279#4114812

Pginer-WMF raised the priority of this task from Normal to High.Jul 23 2018, 10:02 AM
Pginer-WMF lowered the priority of this task from High to Normal.
Pginer-WMF raised the priority of this task from Normal to High.

Change 447583 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/extensions/ContentTranslation@master] Prevent publishing translation with too much unmodified content

https://gerrit.wikimedia.org/r/447583

Screenshot from https://gerrit.wikimedia.org/r/447583

Note that this patch does not include the issue cards in tools column. Waiting for the issue card refactoring to complete before attempting that.

Change 447583 merged by Petar.petkovic:
[mediawiki/extensions/ContentTranslation@master] Prevent publishing translation with too much unmodified content

https://gerrit.wikimedia.org/r/447583

Petar.petkovic moved this task from In Review to QA on the Language-2018-July-September board.
Petar.petkovic removed a subscriber: gerritbot.

Checked in CX2 - three things seem to be missing:

(1) too much unmodified text ≠ too much unmodified machine-translated text (?)
Right now I see "Your translation cannot be published because it contains too much unmodified machine-translated text" when I tried to publish the articles that are blank (when I used the option "Don't user machine machine translation" and just click 'Add translation' to add blank paragraphs.
Also, the same warning appears when I use just source page for translation. Interestingly that the issue summary card appears blank.

(2) Threshold for unmodified content - it seems not possible to publish articles even with 50% of un-modified content and even when only4% of the article is translated and 86% of it is machine-translated. What are thresholds in cx2? Does it take into account what portion of an article is translated? According to the following spec, the specific threshold should be set:

We may want to apply the threshold only when there is enough content to avoid confusions with false positives (e.g., publishing a one sentence article with a few images).

(3) 'View issues' is not shown on the warning.

I filed T202342: CX2: "Too much unmodified content" warning issues (and added it as a subtask to this task) mostly to keep track of the issues related to the "too much unmodified content" warning as a separate ticket since there are some specific issues that may be addressed outside of the scope of this task.

I am returning this task to 'In progress' column, so @Pginer-WMF may review the specs or/and "View issues" option will be implemented.

Checked in CX2 - three things seem to be missing:

Thanks for checking this. Some comments below:

(1) too much unmodified text ≠ too much unmodified machine-translated text (?)
Right now I see "Your translation cannot be published because it contains too much unmodified machine-translated text" when I tried to publish the articles that are blank (when I used the option "Don't user machine machine translation" and just click 'Add translation' to add blank paragraphs.
Also, the same warning appears when I use just source page for translation. Interestingly that the issue summary card appears blank.

It makes sense to prevent publishing an empty document, but the "too much unmodified text" is not the right message to show in such cases since it can lead to confusion. We may need a specific solution for such case.

(2) Threshold for unmodified content - it seems not possible to publish articles even with 50% of un-modified content and even when only4% of the article is translated and 86% of it is machine-translated. What are thresholds in cx2? Does it take into account what portion of an article is translated? According to the following spec, the specific threshold should be set:

We may want to apply the threshold only when there is enough content to avoid confusions with false positives (e.g., publishing a one sentence article with a few images).

@santhosh may know more about the thresholds and how are they applied. I'll try to test with some examples to identify how

(3) 'View issues' is not shown on the warning.
I filed T202342: CX2: "Too much unmodified content" warning issues (and added it as a subtask to this task) mostly to keep track of the issues related to the "too much unmodified content" warning as a separate ticket since there are some specific issues that may be addressed outside of the scope of this task.

Thanks for capturing these in a ticket!

@Pginer-WMF @Etonkovidova About the false positives, we need to fine tune the calculation based on the characteristics of section. For example, we currently exclude section titles. There may be few other cases we need to identify and apply our threshold in a different way. I had created T200416: CX2: Identify section types to exclude from MT abuse test to explore such requirements.

@Pginer-WMF @Etonkovidova About the false positives, we need to fine tune the calculation based on the characteristics of section. For example, we currently exclude section titles. There may be few other cases we need to identify and apply our threshold in a different way. I had created T200416: CX2: Identify section types to exclude from MT abuse test to explore such requirements.

In my experience testing in this area, the error that prevents from publishing disappears after some minimal modification of the content; and the warning does not block the publishing process and can be marked as resolved in case it was too sensitive. So my understanding is that in the current state this should not cause major problems. As you suggested, we can keep testing and proposing adjustments to polish this further in T200416

Follow up on the following issues:

  • the specific warning for publishing blank translation - the warning still referring to unmodified content:

  • the state of 'Publish' button when the summary issue card has not-resolved issues - it's possible to publish

  • the issue summary card refers to machine translation even though no MT options were used:

  • the thresholds for unmodified (blank) content for preventing publishing translation (need more testing)
Pginer-WMF closed this task as Resolved.Sep 3 2018, 9:12 AM
Pginer-WMF moved this task from In Progress to Done on the Language-2018-July-September board.

Follow up on the following issues:

I capture some of the adjustments needed in this follow-up ticket: T203377: CX2: Additional details for too much unmodified content error

Since the basic behaviour (preventing articles without any modification to be published) is supported, I think we can close this ticket and continue the work in the follow-up one.

Pginer-WMF updated the task description. (Show Details)Oct 23 2018, 10:38 AM