The OpusMT project supports many languages. Some relevant languages for Wikimedia may not have [pretrained models](https://opus.nlpl.eu/Opus-MT/) or those may be outdated. For example, several outdated models could not be integrated in MinT (T333969).
This ticket is intended to compile a list of relevant languages where translation models based on the Opus project would be the most useful. This is based on multiple criteria.
# Languages provided only by one translation service
**Supported only by Google Translate**
- Corsican (co)
- Divehi (dv)
- Western Frisian (fy)
- Hawaiian (haw)
- Gan Chinese (gan) !!Currently supported by using Traditional Chinese with Google Translate T258919!!
**Supported only by Yandex:**
- Chuvash (cv)
- Eastern Mari (mhr)
- Western Mari (mrj)
- Yakut (sah)
- Udmurt (udm)
# Languages where MT is available but not using it often:
Based on [the MT usage report](https://nbviewer.org/github/wikimedia-research/machine-translation-service-analysis-2022/blob/main/mt_service_comparison_Sept2022_update.ipynb) these languages have a significant percentage of translations created without using machine translation which may be a sign of low quality:
- Indonesian (id)
- Māori (mi)
# Languages only supported with OpusMT lacking updated pre-trained models
More details in {T333969}
# Languages supported only by models that provide an unexpected variant of the language
- Cantonese (yue) !!Other models only support the wrong variant as per T304865#8684437 !!
- Moroccan Arabic (ary) !!Other models only support the wrong variant as per T339926#9071464 !!
- Gun (guw)
- Sranan Tongo (srn)
- Venda (ve)
- Tahitian (ty)
- Bislama (bi)
- Manx (gv)
- Walloon (wa)
- Breton (br)
- Finnish (fi)
# Languages with translation activity not supported by any MT
There are Wikipedias in 152 languages without machine translation and some of these communities have been using Content Translation despite the lack of MT. The list below shows those languages where Content Translation has been used the most despite the lack of MT. This can be useful to prioritize languages for which enabling MT can make the most impact.
# Cornish (kw)
- Inuktitut (iu)
- Saraiki (skr)
- Kara-Kalpak (kaa)
- Cantonese (yue) !!Uses `zh-yue` code on Wikipedia!!
- N’Ko (nqo)
- Mazanderani (mzn)
- Breton (br)
- Tuvinian (tyv)
- Cree (cr)
- Aromanian (rup) !!Uses `roa-rup` code on Wikipedia!!
- Ossetic (os)
- Min Nan Chinese (nan) !!Uses `zh-min-nan` code on Wikipedia!!
- Scots (sco)
- Võro (vro) !!Uses `fiu-vro` code on Wikipedia!!
- Serbo-Croatian (sh)
- Ladin (lld)
- Nāhuatl (nah)
- Aragonese (an)
- Inari Sami (smn)
- Tachelhit (shi)
- Western Armenian (hyw)
This list has been compiled based on [a query for the last 2 years of activity](https://w.wiki/7BQp), screenshot with the number of translations published is shown below:
{F37161915}
{P50064}