Page MenuHomePhabricator

Learn from user corrections to avoid editing the same term again and again
Open, MediumPublic

Assigned To
Authored By
Apr 15 2015, 5:18 PM
Referenced Files
F23795356: CX-corrections-not-applied.png
Jul 16 2018, 10:16 AM
F23794464: CX-corrections-initial-add.png
Jul 16 2018, 10:16 AM
F23795266: CX-corrections-options.png
Jul 16 2018, 10:16 AM
F23795387: CX-corrections-not-applied-options.png
Jul 16 2018, 10:16 AM
F23794461: CX-corrections-initial.png
Jul 16 2018, 10:16 AM
F23795261: CX-corrections-applied.png
Jul 16 2018, 10:16 AM
F8615694: CX-corrections-initial-add.png
Jul 4 2017, 11:57 AM
F8615711: CX-corrections-options.png
Jul 4 2017, 11:57 AM
"Love" token, awarded by Framawiki.


When translating, users have to correct some terms that are not properly translated by the Machine Translation (MT) service. For example, when translating John Carpenter article, the director's surname can be translated into whichever term is used in the local language for the "carpenter" profession. Since an article is about a specific topic there are chances that those mistakes need to be fixed by our users again and again.

From our user testing sessions we have observed that while fixing it the first time is reasonable, users were negatively surprised that the system didn't learnt the lesson for the next time.

While improving MT services is probably out of the scope for the project, it may be worth it to think in ways CX can save the user time in that process of correction. Some of these mechanisms can be also useful when there is no MT at all acting like a very basic (maybe at word level) MT-like system based on what you have already translated.

Proposed solution

  • Keep track of user corrections on MT that happens repeatedly. We need to decide how many times, how many words and how long they should be to consider them a correction.
  • Replace previous corrections when a paragraph is added if the corrected word is found.
  • Provide a way for users to switch among the alternatives (which include the MT proposed term and the one used in previous corrections).
  • Learn from the use of the alternatives to decide whether to apply corrections automatically or just suggest them.

We'll illustrate the idea with the example of translating the Los Angeles article from Spanish to English. Since "angeles" means "angels" in Spanish, we'll assume that the MT service is going to translate the name of the city too literally, and the user corrects those in the first paragraph:

Initial translation with errorsUser corrects the first paragraph
CX-corrections-initial.png (720×1 px, 256 KB)
CX-corrections-initial-add.png (720×1 px, 259 KB)

When adding the second paragraph (where the name of the city appears again), the system will replace it automatically from the proposed text by the MT based on the fact that the user has corrected it in previous paragraphs. In addition, the replaced text will be highlighted to communicate the user that a correction was applied automatically. This allows the user to undo the automatic change.

CX-corrections-applied.png (720×1 px, 264 KB)
CX-corrections-options.png (720×1 px, 271 KB)

Whether to apply the correction automatically or let the user do it will depend on previous decisions of the user for that word. If the user undoes an alternative that was applied on a paragraph, we can consider not using the replacement word in following paragraphs, and just highlight the MT version to let the user know they can pick an alternative manually. In the example below, the correction is not applied automatically, only suggested for the user to apply:

CX-corrections-not-applied.png (720×1 px, 264 KB)
CX-corrections-not-applied-options.png (720×1 px, 271 KB)

Additional considerations

Based on input from T339907, it may be relevant to consider:

  • User-defined corrections. Manually introduce new corrections or edit existing ones.
  • Support for apply corrections on a broader scope. Users may want to keep some corrections for the following translations, or propose them as general rules for the community.

A first step in this direction is providing alternative for link labels based on the target article title (T197662).
This is also related to the notion of Translation Memory. Based on experiences from using the tool in the class we got input that it may be useful for corrections to be shareable for articles in the same category/topic area, or across a group of users in the same program/campaign/event.

Related Objects

Event Timeline

Pginer-WMF raised the priority of this task from to Needs Triage.
Pginer-WMF updated the task description. (Show Details)
Pginer-WMF subscribed.

This is something that I'd really love to have in some way, as a collaboration with dictionary and MT builders.

Somewhat related issues: T91748, T95886, T92243.

Amire80 raised the priority of this task from Low to Medium.Jul 2 2015, 5:08 PM
Amire80 moved this task from Needs Triage to Bugs on the ContentTranslation board.
Amire80 set Security to None.
Amire80 lowered the priority of this task from Medium to Low.Oct 15 2015, 10:14 AM
Amire80 added a project: OKR-Work.

A relate case of quick correction for what MT proposes is link translation (T145009). When automatic translation fails for links, the linked article title may be the intended information (or a good-enough approximation). Suggesting that as a quick correction can be helpful.

The issue of repeatedly fixing "errors, typos and mistakes" of MT was mentioned in this comment.

I added mockups and illustrated an example to show how the feature could work.

Pginer-WMF raised the priority of this task from Low to Medium.
Pginer-WMF removed a project: Design.
Pginer-WMF unsubscribed.
Pginer-WMF subscribed.

Would the proposed solution work with accented letters (e.g. the case listed at T152905, which is very common in math articles)?

Would the proposed solution work with accented letters (e.g. the case listed at T152905, which is very common in math articles)?

The initial idea is to consider as a correction any modification made to the text, which should work with accented letters and other symbols. However we may want to consider certain thresholds as we start working in this ticket. For example, if the users changes one character from lowercase to uppercase is this a change a correction we want to apply automatically the next time or is that correction likely to fail when applied in the next gramatical context?
In any case, since te approach allows to easily undo the changes, I think it should be ok to start with a basic approach and learn from the different situations we observe int he different languages.

A user recently reported this kind of issue when translating the article about the emperor Basil I, translated by MT as the herb with the same name in each instance in the article,