Page MenuHomePhabricator

Expose user modifications of Machine Translations to help the MT service providers to improve them
Closed, ResolvedPublic1 Estimated Story Points

Description

Translation services such as Apertium provide a good start for users to create their translation. Since users are correcting those, it would be great if those corrections could be easily accessible so that teams developing the translation services could improve them based on that feedback.

As part of this task a discussion on the information to capture and how to expose it is needed. Some fields that may be helpful:

  • Source text
  • MT translation proposed
  • Final translation by the user
  • MT provider
  • Editing distance (so that we can filter big/small differences)?

Event Timeline

Pginer-WMF raised the priority of this task from to Needs Triage.
Pginer-WMF updated the task description. (Show Details)
Pginer-WMF added a subscriber: Pginer-WMF.
Pginer-WMF renamed this task from Expose user modifications of Machine Translations to help their providers to improve them to Expose user modifications of Machine Translations to help the MT service providers to improve them.Jan 22 2015, 2:27 AM
Pginer-WMF set Security to None.

I believe most important fields are the proposed MT and the final translation by user. I think we would do well by storing a (source, proposed, final) triplet in plaintext annotated by mt provider and languages. If we can get sentence alignment, great, but at least by section. I imagine people will be looking for

  1. missing vocabulary
  2. better vocabulary in a certain context
  3. missing or better grammar rules.

Edit distance and other algorithms can be used for evaluation, but those are easy to calculate afterwards if we need them. Btw, this data can also be put in a translation memory, mainly for languages with don't have MT providers..

Arrbee triaged this task as High priority.Feb 2 2015, 6:58 AM
Arrbee moved this task from Needs Triage to Long term on the ContentTranslation board.
Arrbee raised the priority of this task from High to Needs Triage.Feb 2 2015, 9:35 AM
Arrbee added a subscriber: Arrbee.
Amire80 triaged this task as High priority.Feb 4 2015, 12:02 AM

Change 191860 had a related patch set uploaded (by Santhosh):
Expose published translation with source-target URL pairs

https://gerrit.wikimedia.org/r/191860

Patch-For-Review

https://gerrit.wikimedia.org/r/191860 is a quick attempt to provide minimal information. To give more useful information we need to enhance our infrastructure to capture and store user modifications or section level mapping. We have to work towards that.

Change 191860 merged by jenkins-bot:
Expose published translation with source-target URL pairs

https://gerrit.wikimedia.org/r/191860

Arrbee moved this task from In Progress to Done on the LE-Sprint-82 board.