Machine translation models such as [those used by MinT](https://www.mediawiki.org/wiki/MinT#About_MinT) may not produce the best translations for individual words or short sentences due to the lack of context. For example, the English expression "Hello!" is translated by MinT using NLLB-200 as "- ¡Hola, qué haces?" in Spanish (where "¡Hola!" would be expected instead).
{F41523111, position=float}
This ticket proposes to explore the idea of complementing the use of model-based translation with community-provided translations from [Tatoeba](https://tatoeba.org/) (or a similar community) when there is an exact match. That is, given a source sentence to translate (e.g., "Hello!), before requesting the translation to a model such as NLLB-200, the sentences from Tatoeba will be searched for an exact match. If such match exist for the language pair, the translation from Tatoeba will be used. If not, the machine translation model will be used instead.
This approach is expected to provide two key benefits to MinT:
- **Better translations.** Tatoeba is more likely to include short sentences and expressions, which is where translation models can be more problematic. So both approaches seem that could complement well each other.
- **A more direct way for users to improve translations.** When users encounter a translation that is wrong, they could easily contribute a better translation in Tatoeba. Since the exact match approach doe snot require complex machine learning training, the updated translations from Tatoeba can be incorporated at a much quicker pace. As a result, a user providing an improved translation into Tatoeba is more likely to see the translation fixed in a shorter period of time.
This pre-search approach can be provided as an option on the MinT API, making it possible to test models directly when needed.