Page MenuHomePhabricator

Create easy-to-use ways to get information about how much machine translation was used in a given translated article
Closed, ResolvedPublic

Description

This was requested at https://www.mediawiki.org/wiki/Topic:T3qwod3z26ouew2w .

It should be useful for community editors and patrolers, and for machine translation developers, too.

This may be possible with the cxpublishedtranslations and the parallel corpora APIs, but it could be easier. (See also  T135705 and  T135706).

Event Timeline

Amire80 created this task.May 19 2016, 7:28 AM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptMay 19 2016, 7:28 AM
Amire80 triaged this task as Medium priority.May 19 2016, 4:42 PM
Amire80 moved this task from Needs Triage to Bugs on the ContentTranslation board.
NickK awarded a token.May 31 2016, 3:30 PM
NickK added a subscriber: NickK.May 31 2016, 3:33 PM

Even the simplest version (used machine translation or did not use machine translation) would be very helpful. The main goal is being able to check machine-translated articles for common machine translation mistakes that a human cannot make.

Arrbee moved this task from Bugs to Enhancements on the ContentTranslation board.Jun 22 2018, 1:40 PM
Arrbee moved this task from Bugs to Enhancements on the ContentTranslation board.
Pginer-WMF closed this task as Resolved.May 27 2019, 4:57 PM
Pginer-WMF claimed this task.
Pginer-WMF added a subscriber: Pginer-WMF.

With the new version of Content translation, now there are several mechanisms that can help with this. This should be enough for most of the purposes (review problematic translations, inspecting a particular translation and getting a general overview), so I'm closing the ticket (feel free to reopen if that's not the case), and detailing each case below:

Review problematic translations: tracking categories.

For each language you can find a tracking category where the translations that were published and may have not been reviewed enough are included. You can check the documentation here. This allows to focus the review on those translation that while complying with the limits to be allowed to publish, for some paragraphs the initial machine translation was not edited enough for some paragraphs.

  • Inspecting a particular translation: Translation debugger.**

The translation debugger allows to capture information about a specific translation. you can add the codes for the source and target languages and the source article title to get the metadata about the translation. This includes the "progress" field:

progress {"any":0.9583333333333334,"human":0.7083333333333334,"mt":0.25,"mtSectionsCount":6,"translatedSectionsCount":23}

In this example, the overall translation contains 25% of machine translation ("mt"), for example.

If you click on "Fetch translation", you can get information on the translation service used for each paragraph (note that users may use MT for a paragraph and not use it for another).

Getting a general overview: APIs and stats
As mentioned earlier, using our data APIs and dumps about published translations allows to capture the information to use it in other ways. This graph provides an overview of the number of articles in which each translation service is used (including those cases where no MT is used).