Page MenuHomePhabricator

ContentTranslation does not translate accented words correctly
Closed, InvalidPublic

Description

The tool is always messing up translations containing accented letters. The most recent example is this:

  • Expected result: 'polynomial' -> 'polinômio'
  • Actual result: 'polynomial' -> 'polinˆ omio'

This happened using Firefox 50, on https://pt.wikipedia.org/wiki/Special:ContentTranslation?page=Rational+root+theorem&from=en&to=pt&targettitle=Teorema+das+ra%C3%ADzes+racionais

Event Timeline

Which machine translation provider does this?

The web interface of Yandex API seems to be correct:

image.png (193×1 px, 12 KB)

Please note, this may be due to the Yandex translation API version that is used in Content Translation. We are currently using the officially released version of the API, while Yandex's web interface uses an improved technology that is being beta-tested. This could be the reason why the correct translation is shown on the web interface. Thanks.

I used the sentence from the source article mentioned in the bug report - Rational root theorem, and confirmed that wrong result is from Yandex API

image.png (204×1 px, 18 KB)

So, this is not a bug resulting from CX codebase but from Yandex MT api. Perhaps this fragment having a wrong translation in the MT training data in Yandex.