Page MenuHomePhabricator

Translate suggests Apertium translations also from non-source languages the translator may not know
Closed, ResolvedPublic

Description

Hi all

Translation of english text: "Search for objects around coordinates of this article"
shows a french suggestion as: "Chercher des objets autour des coordenadas de cet article"
and obviously " coordenadas " does not belong to the french dictionary.

Question:
Since I have met several similar situations, I wonder why the suggested text can propose words that even do not belong to the language. Is there a way to detect that to get a text more coherent ?

Thanks.


URL: https://translatewiki.net/wiki/Wikimedia:Wikinity-enter-name-of-article/fr

Event Timeline

Seems like an issue with Apertium or the automatic translating feature. I'd love to see it removed for Spanish as it sometimes "translates" to nonsense.

@Raymond @Nikerabbit Could you please have a look at this one?

Thanks.

I can add that, as far as I remember, when this situation occurs, the foreign words come from the spanish dictionary .... may be a track. Thanks.

Since there is no en->fr available, it (Translate extension) is doing es->fr instead, and that can expose words it cannot translate from Spanish.

Aklapper renamed this task from [[Wikimedia:Wikinity-enter-name-of-article/fr]] i18n issue Suggested words out of language dictionary to Translate suggestions sometimes include words which are not part of that language.Oct 4 2017, 7:32 PM

Hence "not a bug" / declined?

I don't understand such behaviour. When you mean "there's no en -> fr" avalaible, what do you exactly mean? I'm confused. Thanks.

Apertium does not do machine translation from English to French.

Thank you. Do we do Apertium work from Phabricator or it's something that is handled elsewhere. If that's out of our control, then yes, I'd agree with @Aklapper on closing not because it's not a bug, but because we can't fix it ourselves.

Do we do Apertium work from Phabricator or it's something that is handled elsewhere.

https://www.mediawiki.org/wiki/Upstream_projects

If you oppose the behavior of falling back to another language, you can file a task against Translate (or convert this one).

This is not a bug in Apertium. If it were, I would be happy to report it to upstream.

Also found today on MediaWiki FAQ translation:
https://www.mediawiki.org/w/index.php?title=Special:Translations&message=Translations%3AManual%3AFAQ%2F508%2Ffr

Les plantillas importées depuis autres wikis (comme Wikipédia) ne me fonctionnent pas

Templates imported from other wikis (such as Wikipedia) don't work for me

effectivly, suggested by Apertium
... better to keep the english word everyone understands, to avoid the exotic plantillas suggestions

  • other completly meaningless case /spanish+subject/ :

Mediawiki:
Texte original anglais -> include standard header (category hierarchy path & notice)
suggestion Apertium -> Comprendre cabecera standard (route d'hiérarchie de catégories et son avis)

Apertium says:
"We do not have eng->fra, but we do have eng->spa and spa->fra, so our middleware will perform the eng->fra translation via interlingua eng->spa->fra, which results in some words coming through as Spanish."

The consequence of that is that you multiply the number of errors, if the en-sp and sp-fr translated texts are misunderstood. More of that you need an expert who understands english and spanish, and another who understands spanish and french to make the corrections in the place where a single en-fr translator would be sufficient.

For my part I generally select a suggested text among the most used proposed to keep coherence with previous cases (surely reviewed). For not suggested texts I use Google Trad as the best; right translated words are proposed, at the right place in the sentence, and the idea of the text is preserved.

APERTIUM has a lack of words and sometimes does not respect french grammar.

Switching on Google would be an advantage (Google open source ? ...may be not possible)
but with APERTIUM it is not reliable to suggest erroneous text to translators. More of that APERTIUM proposal haven't been approved before (the difference with % used) and some translators take this first proposal as good bread and reviewers must make the translation themselves.

Well, to go ahead do we investigate the corrections in APERTIUM .xml data files?

As a first step, if we follow their explainations, the number of words of en dictionary should be the same as the sp and fr ones.

Will the updated files remain locally on translatewiki.net servers ? or must they be delivered in sourceForge ?


I have no idea which files you are talking about.

There is no Apertium middleware in use that performs translation via a middle language. There is logic in the Translate extension to use some non-English language as a source language. And those translations are done by translators, so the number of machine translation errors is not doubled! Of course there is still the possibility of semantic drift.

I refered to https://sourceforge.net/p/apertium/tickets/128/ which is the answer of Tino Didriksen - 1 day ago

Nemo_bis renamed this task from Translate suggestions sometimes include words which are not part of that language to Translate suggests Apertium translations also from non-source languages the translator may not know.Oct 15 2017, 7:14 PM
Nemo_bis triaged this task as Medium priority.
Nemo_bis removed a project: I18n.

It is still unclear to me which of the following actions I should take:

  1. Keep the status quo (as has been for years, the suggestions can be ignored if they are not useful)
  2. Remove the feature (and lose potentially useful suggestions)
  3. Add further restrictions (only pick a source language the translator knows as indicated in the babel box)
  4. Add further restrictions (don't show suggestions that include untranslated words)
Aklapper lowered the priority of this task from Medium to Low.Nov 20 2017, 9:10 PM

@Wladek92: Any thoughts on Nikerabbit's last comment?

  1. Add further restrictions (only pick a source language the translator knows as indicated in the babel box)

This would surely lead to more confusions (it is much easier to recognize a “bad” word when it comes from a totally foreign language). When they notice an issue in automatic translation, translators should refer to source text.

  1. Add further restrictions (don't show suggestions that include untranslated words)

Would that filtering be technically easy to implement? I don’t think it is worthwhile.

Based on the feedback, I suggest to stop using Apertium when target language is Spanish, French or Dutch.

Change 828517 had a related patch set uploaded (by Nikerabbit; author: Nikerabbit):

[mediawiki/extensions/Translate@master] Do not use Apertium for certain target languages

https://gerrit.wikimedia.org/r/828517

Change 828517 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] Do not use Apertium for certain target languages

https://gerrit.wikimedia.org/r/828517

Confirmed on translatewiki.net