Page MenuHomePhabricator

CX2: Improve the approach used for adding translations to the "unreviewed content" tracking category
Closed, ResolvedPublic

Description

Translations that are published with paragraphs containing too much unmodified content (80% of Machine translation or 60% of source content) are published in a tracking category for the community to review (T211763, T190798). We want the tracking category to be useful: not too broad to include too many false positives, not too narrow or including holes for problematic articles to be skipped. Based on feedback and casual observations, further adjustments seem to be needed.

The problem

The tracking category on French Wikipedia seems to accumulates articles that were apparently published without issues.
For languages with no MT, many unreviewed articles in the category do not seem to contain significant portions of text in the source language (this may have been fixed by T215591). Feedback from an editor in the Arabic community (T211571#4996154) suggested further adjustments may be needed to make the category more relevant.

Proposed solutions

We want to try the following adjustments:

  • Consider a less strict threshold for paragraphs that users marked as resolved. For paragraphs where the unmodified content warning was shown but the user marked it as resolved, we can apply a less strict threshold (95% of Machine translation or 75% of source content). This will provide a way to accommodate cases where the automatic translation was exceptionally good, but still avoid potential abuse of the feature (i.e., not following blindly the user confirmation).
  • Consider more than one problematic paragraph for adding to the category. Currently a translation is added to the tracking category with just one problematic paragraph. We may want to consider more than one paragraph to make the approach a bit less sensitive to false positives.

(Additional adjustments can be considered to the thresholds or how they are applied based on further discussions and evaluation)

Evaluation

We want to evaluate the improvements of each approach in a rigorous way. One possible approach could be to Inspect articles in the tracking category to identify two groups: (a) false positives (articles that are perfectly fine but were included in the tracking category), and (b) truly problematic (articles that were not reviewed enough).
Once an approach (or a combination) is tested we can re-evaluate the articles from the defined groups to identify which is the approach that minimizes false positives while keeping the truly problematic translations in the tracking category.

When inspecting the articles we need to pay special attention, identify which are the paragraphs that were exceeding the thresholds in Content translation. This will help identify new possible strategies to consider (e.g., excluding certain kinds of content). We may also need input from native speakers to make sure the article classification is accurate.


Related: T209868: Extend translations graph to show also published translations that need review

Event Timeline

Pginer-WMF triaged this task as Medium priority.Mar 5 2019, 1:30 PM

Change 503559 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/extensions/ContentTranslation@master] Introduce less strict threshold for suppressed MT issues

https://gerrit.wikimedia.org/r/503559

Change 503559 merged by Petar.petkovic:
[mediawiki/extensions/ContentTranslation@master] Introduce less strict threshold for suppressed MT issues

https://gerrit.wikimedia.org/r/503559

@Pginer-WMF - please review the following testing results:

Two parts of this tasks were tested separately:

  • Lower thresholds for unmodified content
  • investigate false positives in articles are in Category:Pages with unreviewed translations (and similar categories in other wikis)

I did testing on the following:

Consider a less strict threshold for paragraphs that users marked as resolved. For paragraphs where the unmodified content warning was shown but the user marked it as resolved, we can apply a less strict threshold (95% of Machine translation or 75% of source content).

Testing cases (the behavior could be specific to cx2-testing) showed that
(1) text translated with the MT translation option, when marked as resolved, is allowed to be published with 99% of unmodified text
(2) text translated "Copy original content" option, when marked as resolved, is allowed to be published with 99% of unmodified text. There might be an addtional warning

Screen Shot 2019-06-05 at 5.02.29 PM.png (519×1 px, 196 KB)

(3) there is a discrepancy between counting overall amount of unmodified text (an error) and paragraph-related warning of unmodified text reported as T225088: Discrepancy in percentage count of unmodified content between error and warning reports

discrepancy_percentage_unmodified_translation.gif (693×1 px, 652 KB)

Looking in db for enwiki, the number of the pages in the category Pages with unreviewed translations is small comparing to the number of articles that have tags contenttranslation, contenttranslation-needcheck, contenttranslation-v2 (even if we filter out contenttranslation-needcheck and take into account that two tags - contenttranslation-needcheck, contenttranslation-v2 - are assigned together for the translations made with v2:

mysql:research@dbstore1003.eqiad.wmnet [enwiki]> select ct_tag_id, ctd_name, ctd_count  from change_tag join change_tag_def on ct_tag_id=ctd_id where ct_tag_id in (41,169, 522) group by ct_tag_id;
+-----------+------------------------------+-----------+
| ct_tag_id | ctd_name                     | ctd_count |
+-----------+------------------------------+-----------+
|        41 | contenttranslation           |     12378 |
|       169 | contenttranslation-needcheck |       573 |
|       522 | contenttranslation-v2        |      1628 |
+-----------+------------------------------+-----------+
3 rows in set (0.01 sec)

mysql:research@dbstore1003.eqiad.wmnet [enwiki]>  select count(*) from categorylinks where cl_to='Pages_with_unreviewed_translations';
+----------+
| count(*) |
+----------+
|      187 |
+----------+
1 row in set (0.01 sec)

Open questions

  • It might be that Pages with unreviewed translations category (and similar categories in other wikis) has many false positives. It's possible to test that in cx2-testing which does have the page Category:Pages with unreviewed translations. Should we come up with some testing scenarios to check that? What logic is behind putting a translated article into Pages with unreviewed translations category?
  • should we check the number of pages in Pages with unreviewed translations category in different wikis?
  • Also, ORES scores can be checked for the articles in Pages with unreviewed translations category
mysql:research@dbstore1003.eqiad.wmnet [enwiki]>  select count(*) from categorylinks where cl_to='Pages_with_unreviewed_translations';
+----------+
| count(*) |
+----------+
|      187 |
+----------+
1 row in set (0.01 sec)

I'm curious about this query. For English it returns 187 results which match the total number of articles displayed in the category itself and is consistent with a query I made to check the distribution in months.
However, when checking it for spanish see only one result with the query but 2482 in the category. Do you know what exactly is listed in the categorylinks table and why this inconsistency?

Open questions

  • It might be that Pages with unreviewed translations category (and similar categories in other wikis) has many false positives. It's possible to test that in cx2-testing which does have the page Category:Pages with unreviewed translations. Should we come up with some testing scenarios to check that?

I have the impression that there may be still a high number of high positives, but most of them are hard to test in an automatic way. There are the following kinds of problems:

  • The user does not modify a paragraph because the content is not expected to be edited at all, but the tool does not know about it (e.g., math formulas: T225118).
  • Machine translation worked so well for a paragraph that the user had to edit it much less than expected, and the tol thinks the content is not edited enough.
  • The user makes enough modifications to the content but our code to identify those is not counting them.

From those, the last one is the one that seems easier to test automatically, but I don't expect to represent most of the problematic cases.

What logic is behind putting a translated article into Pages with unreviewed translations category?

A quick summary would be that translations are added when they are published but they still have 2 to 9 paragraphs with unmodified contents (each with +80% of the original MT, or +60% of the original text). These values change based on whether the user confirmed the warnings or whether the user had previously deleted translations. This is described in the documentation about the limits.

  • should we check the number of pages in Pages with unreviewed translations category in different wikis?

Yes, especially given the inconsistencies I mentioned above. It would be useful to consider some of the top translated languages (French, Spanish, Catalan) as well as those not supporting MT (German and Italian).

  • Also, ORES scores can be checked for the articles in Pages with unreviewed translations category

That can be interesting. I have not thought much yet on how the ORES predictions can be applied, but I'm open for suggestions.

mysql:research@dbstore1003.eqiad.wmnet [enwiki]>  select count(*) from categorylinks where cl_to='Pages_with_unreviewed_translations';
+----------+
| count(*) |
+----------+
|      187 |
+----------+
1 row in set (0.01 sec)

I'm curious about this query. For English it returns 187 results which match the total number of articles displayed in the category itself and is consistent with a query I made to check the distribution in months.
However, when checking it for spanish see only one result with the query but 2482 in the category. Do you know what exactly is listed in the categorylinks table and why this inconsistency?

It turned out that Páginas_con_traducciones_sin_revisar category in eswiki is, in fact, Wikipedia:Páginas_con_traducciones_sin_revisar. When the category name is corrected, the query returns the number of articles that is close to the number lshown on Páginas con traducciones sin revisar page.

mysql:research@dbstore1003.eqiad.wmnet [eswiki]> select year(cl_timestamp) as Year,MONTHNAME(cl_timestamp) as Month, count(*) as Unreviewed 
    -> from categorylinks 
    -> where cl_to='Wikipedia:Páginas_con_traducciones_sin_revisar'
    -> group by year(cl_timestamp),month(cl_timestamp)
    -> order by cl_timestamp desc;
+------+----------+------------+
| Year | Month    | Unreviewed |
+------+----------+------------+
| 2019 | June     |         65 |
| 2019 | May      |        303 |
| 2019 | April    |        515 |
| 2019 | March    |        757 |
| 2019 | February |        495 |
| 2019 | January  |        360 |
+------+----------+------------+
6 rows in set (0.01 sec)

Thanks, @Pginer-WMF for your comments. It seems that the mechanism of putting a translated article into "Pages with unreviewed translations" category (and equivalent categories in other wikis) is well defined and documented. If a clear logic for assigning translated articles is in place, then I don't have any questions regarding actual assignment to the category. Unless there are some doubts about whether that logic is correctly applied (that could be tested).

To test the validity of the logic (i.e. how the quality of article suffers with actual tresholds for unmodified content) is, as you mentioned it, not quite possible. It might be interesting to get stats on how ORES scores articles that have contenttranslation tags.

Another thought on false positives in "Pages with unreviewed translations" category - what do we know about how editors remove such category? It just might be that the category simply was not been removed after an article had been improved.

Thanks @Etonkovidova for sharing your thoughts and helping with the query.
Regarding ORES, I made a quick experiment with Recent Changes and there is not much diversity in scores, all published translations are classified as good and made in good faith. (cc @Halfak )

Regarding categories not being removed after the translation is review is totally possible. For that, I guess that communication to get editors aware of the category can help.

@Etonkovidova, do you need any additional info to continue QA on this ticket?

The criteria for preventing user from publishing or adding to tracking category is documented in this mediawiki page. I suggest you cover all the cases laid out in T221359, while paying attention that MT limits vary if user marks the MT abuse warning as resolved, which is the topic of this ticket.

Also, when checking MT percentage on paragraph level, keep in mind the numbers are not updated as you type, but after some delay, which is covered in T200683.

Etonkovidova closed this task as Resolved.EditedJun 14 2019, 8:45 AM

Thx, @Petar.petkovic.

Tested in cx2-testing:

Number of MT(unmodified) sectionsResult of publishing
1can publish - no tracking category
2-9published with added category "Pages with unreviewed translations" (see Note)
10not being able to published

Note: If the translation was assigned to "Pages with unreviewed translations" and then the same article will be re-worked in ContentTranslation - the category will be removed upon publishing. At least, false positives are not present when additional editing on an article in the category is done in ContentTranslation tool. If an article is edited via editors, the category should be removed manually, of course.

To check the tests for a user with previous deleted translations, I'll need user rights (in cx2-testing) to be able to delete articles. For now, I am closing this task as resolved add it to my list of follow-up tasks, so testing for previous deleted translations would be done.

Looping in @Jpita.

@Etonkovidova I instructed you to cover cases laid out in T221359 and try having some sections with high MT warnings marked as resolved, because that is when MT limits change. In the end, that is what this ticket was about.

As for limits, T221359#5258405 changed the spec.

To check the tests for a user with previous deleted translations, I'll need user rights (in cx2-testing) to be able to delete articles.

@KartikMistry, please add necessary rights to @Etonkovidova's test accounts on cx2-testing for this.

@Etonkovidova I instructed you to cover cases laid out in T221359 and try having some sections with high MT warnings marked as resolved, because that is when MT limits change. In the end, that is what this ticket was about.

It's been part of the testing

  • to see what effect checking 'Resolve' or 'Publish anyway' (to publish with existing issues) has on actual publishing.
  • I varied the amount of MT (and unmodified content) - from 70% to 98% when I was checking the above.
  • checked the calculation of MT content percentage (counting chars and substitute specific amount with modified content) - it was done previously
  • compared the paragraph to paragraph MT calculation to the progress bar calculation (also done in other ticket).

Notes for testing:

  • two users: Administrator and extendedconfirmed
  • extendedconfirmed user publishes couple of translations which will be deleted by an Administrator
  • check whether extendedconfirmed user would be prevented to publish according to the limitations in T221359.