The OpusMT project supports many languages. Some relevant languages for Wikimedia may not have pretrained models or those may be outdated. For example, several outdated models could not be integrated in MinT (T333969).
This ticket is intended to compile a list of relevant languages where translation models based on the Opus project would be the most useful. This is based on multiple criteria.
Languages provided only by one translation service
Supported only by Google Translate
- Corsican (co)
- Divehi (dv)
- Western Frisian (fy)
- Hawaiian (haw)
- Gan Chinese (gan) Currently supported by using Traditional Chinese with Google Translate T258919
Supported only by Yandex:
- Chuvash (cv)
- Eastern Mari (mhr)
- Western Mari (mrj)
- Yakut (sah)
- Udmurt (udm)
Languages where MT has signs of low quality
MT available but not using it often
Based on the MT usage report these languages have a significant percentage of translations created without using machine translation which may be a sign of low quality:
- Indonesian (id)
- Māori (mi)
Community-reported low quality
- Goan Konkani (gom) View discussion
Languages only supported with OpusMT lacking updated pre-trained models
More details in T333969: Enable Opus models for languages lacking other Machine Translation options
- Cantonese (yue) Other models only support the wrong variant as per T304865#8684437
- Moroccan Arabic (ary) Other models only support the wrong variant as per T339926#9071464
- Gun (guw)
- Sranan Tongo (srn)
- Venda (ve)
- Tahitian (ty)
- Bislama (bi)
- Manx (gv)
- Walloon (wa)
- Breton (br)
- Finnish (fi)
Languages with translation activity not supported by any MT
There are Wikipedias in 152 languages without machine translation and some of these communities have been using Content Translation despite the lack of MT. The list below shows those languages where Content Translation has been used the most despite the lack of MT. This can be useful to prioritize languages for which enabling MT can make the most impact.
- Cornish (kw)
- Inuktitut (iu)
- Saraiki (skr)
- Kara-Kalpak (kaa)
- Cantonese (yue) Uses zh-yue code on Wikipedia
- N’Ko (nqo)
- Mazanderani (mzn)
- Breton (br)
- Tuvinian (tyv)
- Cree (cr)
- Aromanian (rup) Uses roa-rup code on Wikipedia
- Ossetic (os)
- Min Nan Chinese (nan) Uses zh-min-nan code on Wikipedia
- Scots (sco)
- Võro (vro) Uses fiu-vro code on Wikipedia
- Serbo-Croatian (sh)
- Ladin (lld)
- Nāhuatl (nah)
- Aragonese (an)
- Inari Sami (smn)
- Tachelhit (shi)
- Western Armenian (hyw)
This list has been compiled based on a query for the last 2 years of activity, screenshot with the number of translations published is shown below: