Page MenuHomePhabricator

Explore the integration of MarianMT
Open, NormalPublic

Description

MarianMT is an open source Neural Machine Translation framework and the the OPUS project is building language models based on their corpora for MarianMT. The OPUS project compiles a parallel corpus of translation examples, including those created by using Content translation.

The integration of this project into Content translation (and other Wikimedia projects) would provide new opportunities to expand the use of machine translation to new languages and new usecases. This would be the first approach that is both opensource and based on neural machine translation, making it different from the existing options. This makes it possible, for example, to integrate user corrections done with Content translation back into the system to improve the translation quality.

The current ticket proposes to explore the possibility of such integration by defining the initial steps to follow, including technical aspects to evaluate among other considerations.