Page MenuHomePhabricator

Create a library/service that serves matches from Tatoeba
Open, HighPublic

Description

Tatoeba can be a useful resource to provide community-verified translations that can be used as a translation memory and as a way to correct issues from machine translation models (T351748).

This ticket proposes the creation of a service or library that allows to search for the translations from Tatoeba. The expected usage is to look for an exact match for a sentence in a given source language and obtain the translation(s) in the indicated target language based on the Tatoeba data.

Tatoeba data can be obtained from their downloads page

Following a similar approach to other translation memory systems used in the Localization Infrastructure can be helpful for development and reuse in existing products. A separate ticket will focus on the integration with MinT.