CX2: Add to a maintenance category when translation is published with too much unmodified text
Closed, ResolvedPublic

Description

A threshold of unmodified text is defined in T190279 to encourage users to review low quality translations. Warnings will be shown for paragraphs that having a reasonable length have not been edited enough. However, these warnings do not prevent users from publishing.

In order to facilitate the work of the community, those translations will be included into a "unreviewed translation" category.

A translation will be included in the category when any of their paragraphs meets the criteria for showing the unmodified text warning. Even if the user marked all the warnings as "resolved". This category is intended enable communities to review the translations that have not been completely reviewed.


Previously we were considering to use a #unreviewed translation edit tag, but a category was preferred since it fits better the workflow of finding+fixing issues.

There are a very large number of changes, so older changes are hidden. Show Older Changes

As per @Nikerabbit comment, we can consider to use a category instead of an edit tag. The ticket goal is to provide a tool for editors to find easily the content that is more likely to be problematic, and I'm ok with using whichever mechanism works best for that. Categories can be removed, which works better for marking issues on content that can be resolved. Although categories cannot be filtered in Recent Changes, there are plans to support it in the near future.

Based on this, I'm updating the ticket description to propose the use of a category instead. But feel free to share any thoughts on advantages or limitations of either option.

Pginer-WMF renamed this task from CX2: Add edit tag when translation is published with too much unmodified text to CX2: Add to a maintenance category when translation is published with too much unmodified text.Jun 12 2018, 8:11 AM
Pginer-WMF updated the task description. (Show Details)
Framawiki added a subscriber: Framawiki.
Pginer-WMF raised the priority of this task from Normal to High.Aug 27 2018, 8:55 AM
Pginer-WMF updated the task description. (Show Details)Sep 6 2018, 11:11 AM

This ticket is related to the Unmodified text warning (T190279) to keep track of those translations that were not reviewed despite the warning. The error (T190283) is targeted to more clear vandalism (lack of minimal modification for the whole document) and we are preventing an article to be created. I added more details to the description to avoid the confusion with these different but related mechanisms.

Change 461628 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/extensions/ContentTranslation@master] Add aa maintenance category when translation is has unresolved issues

https://gerrit.wikimedia.org/r/461628

Based on gerrit discussion, some changes are done

Add a tracking category when translation has some MT abuse

  • The category can be localized using 'cx-unreviewed-translation-category' message.
  • Category is defined as a tracking category. Admins can optionally hide the category appearing in pages by adding HIDDENCAT in category page.
  • Use Special:TrackingCategories to see the description of category.

Thanks for your patch !

Two issues not directly related to this patch:

  • Unmodified source text also counts as MT abuse. This means the category gets added for that too, even though there is no MT involved.
  • We send the localised category names. This fails when $ContentTranslationTranslateInTarget is false and target language differs from content language. It would be better to always send canonical category prefix given we normalize them in the backend now.
  • Unmodified source text also counts as MT abuse. This means the category gets added for that too, even though there is no MT involved.

This is expected bahaviour. If I'm translating an article from Japanese to French, leaving a paragraph in Japanese with no modification on the final French Wikipedia published article seems something to flag for review. Note that in the title we refer to "too much unmodified text", which should include both unmodified MT and unmodified source text.

Is there something you find strange about the proposed behaviour?

Change 461628 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Add a tracking category when translation has some MT abuse

https://gerrit.wikimedia.org/r/461628

@Pginer-WMF This is the current wording of the category name and its description:

	"cx-unreviewed-translation-category": "Pages with unreviewed machine translation",
	"cx-unreviewed-translation-category-desc": "Pages translated with the Content Translation tool that contain a high amount of unreviewed machine translation output"

@Pginer-WMF This is the current wording of the category name and its description:

Thanks for surfacing this, @Nikerabbit. I'd suggest updating these to the following:

	"cx-unreviewed-translation-category": "Pages with unreviewed translations",
	"cx-unreviewed-translation-category-desc": "Pages translated with the Content Translation tool that contain a high amount of unreviewed content."

I think these should be clear but still general enough to include all kinds of unreviewed content. For example, we could include in the future ignored spellchecker errors too, and the category and its description would still make sense. Please, let me know if you think otherwise or anyone has a different suggestion.

Another one I spotted: Your translation cannot be published because it contains too much unmodified machine-translated text when publishing is prevented.

I can fix these. My concern is that "unreviewed content" can be interpreted as unreviewed for factual accuracy.

Change 468278 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[mediawiki/extensions/ContentTranslation@master] Improve mt abuse strings

https://gerrit.wikimedia.org/r/468278

Another one I spotted: Your translation cannot be published because it contains too much unmodified machine-translated text when publishing is prevented.

I can fix these. My concern is that "unreviewed content" can be interpreted as unreviewed for factual accuracy.

Is that the text used for T190283? If that's the case I'd recommend adjusting it to meet the spec too:

After trying to publishWhen checking the issue card

Another one I spotted: Your translation cannot be published because it contains too much unmodified machine-translated text when publishing is prevented.

I can fix these. My concern is that "unreviewed content" can be interpreted as unreviewed for factual accuracy.

Is that the text used for T190283? If that's the case I'd recommend adjusting it to meet the spec too:

After trying to publishWhen checking the issue card

The error when translation contains only MT output needs some more work to support displaying the error inside issue card as well. All the work needed is captured in T203377, so let's not mix two tasks.

@Pginer-WMF, you proposed following messages:

"cx-unreviewed-translation-category": "Pages with unreviewed translations",
"cx-unreviewed-translation-category-desc": "Pages translated with the Content Translation tool that contain a high amount of unreviewed content."

@Nikerabbit raised concern that "unreviewed content" can be interpreted as unreviewed for factual accuracy. I would like to hear your thoughts on using "unreviewed content" part.

The error when translation contains only MT output needs some more work to support displaying the error inside issue card as well. All the work needed is captured in T203377, so let's not mix two tasks.

Thanks for pointing to the right ticket. I made it more visible in the description of the original to avoid similar future confusions.

@Pginer-WMF, you proposed following messages:

"cx-unreviewed-translation-category": "Pages with unreviewed translations",
"cx-unreviewed-translation-category-desc": "Pages translated with the Content Translation tool that contain a high amount of unreviewed content."

@Nikerabbit raised concern that "unreviewed content" can be interpreted as unreviewed for factual accuracy. I would like to hear your thoughts on using "unreviewed content" part.

I expect the user context to avoid the issue. We are talking about translations, so I'd expect the kind of review to be about translation quality. This could be problematic if our thresholds/algorithm generates too many false positives (then people may wonder what else may be wrong).

We plan to do user research focused on understanding how the different guidance mechanism work and are understood by our users. That may help us understand if further adjustments are needed.

@Pginer-WMF The primary context where these messages (cx-unreviewed-translation*) are shown is https://en.wikipedia.org/wiki/Special:TrackingCategories. In my mind Wikipedia editors are at least as interested in ensuring that the content is factually correct than that the language is natural. Hence I think there is high risk of misunderstanding if the word "content" is used.

@Pginer-WMF The primary context where these messages (cx-unreviewed-translation*) are shown is https://en.wikipedia.org/wiki/Special:TrackingCategories. In my mind Wikipedia editors are at least as interested in ensuring that the content is factually correct than that the language is natural. Hence I think there is high risk of misunderstanding if the word "content" is used.

Even if for some users the expectation about "content that comes from the translation and was not reviewed" is more general, I'm not sure which are the problematic consequences. The content was flagged because the translator ignored the recommendation to review it further. If the article reads fine but reviewers catch other issues such as lacking references, that's ok too.

If there is any suggestion for a more precise description that does not involve exposing specific CX jargon, I'd be happy to use the message. But I'd recommend this not to be a blocker, if we can adjust it later in case of confusion.

Change 468278 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Improve mt abuse strings

https://gerrit.wikimedia.org/r/468278

@Nikerabbit
(1) The page for Category:Pages with unreviewed machine translation‎ is marked as red link on https://en.wikipedia.org/wiki/Special:TrackingCategories although one page was marked with the category - see https://en.wikipedia.org/w/index.php?title=Draft:UniMaker&action=history:

(2) I did not see that the Category: was added when a translation with a warning (too much unmodified content) was published. The categories were added as <nowiki> text:
See the example: https://en.wikipedia.org/wiki/User:Etonkovidova/Triangle_(Virginia) (was translated from es to en and it has one warning).

<nowiki>
[[Category:Census-designated places in Virginia]]
[[Category:Populated places in Prince William County, Virginia]]
[[Category:Pages with unreviewed machine translation]]</nowiki>

The same <nowiki> tag will be added in cx2-testing.

  1. Having it be a red link is fine. It is up to the communities to create the category description pages if they want.
  2. When publishing under user space, categories are always added inside <nowiki> to avoid polluting real categories with drafts.
Etonkovidova closed this task as Resolved.Nov 5 2018, 3:34 PM

Thx, @Nikerabbit - closing as Resolved,

Trizek-WMF added a subscriber: Trizek-WMF.

That's a great improvement communities would be happy to know about. I think it worth a Tech News announcement.

Is this improvement targeting CX2 only? Would it worth it to have it on CX1 too?
What is the category (I guess that's a link translated from translatewiki)?

Message key is cx-unreviewed-translation-category and if I am not mistaken this is CX2 only. CX1 is in a maintenance mode.

Looks good. Thanks for the update.