The MADLAD-400 open source translation model supports many languages. Initial testing suggests quality may not be always very high, but can be still useful for those languages not supported by other services. Community input can help to identify when it is useful.
This ticket proposes to enable the MADLAD-400 model in the MinT test instance for communities to be able to try it. We selected Wikipedia languages which are not supported by any other translaiton service.
These are the languages selected:
(In bold languages with Content and Section translation enablement planned in T353510)
- Arpitan (frp)
- Kabardian (kbd)
- Moksha (mdf)
- Gorontalo (gor)
- Avar (av)
- Komi-Permyak (koi)
- Chechen (ce)
- Erzya (myv)
- Adyghe (ady)
- Newari (new)
- Kalmyk (xal)
- Jamaican Creole English (jam)
- Mon (mnw)
- Fiji Hindi (hif)
- Komi (kv)
- Tulu (tcy)
- Pampanga (pam)
- Tetum (tet)
- Karachay-Balkar (krc)
- Chamorro (ch)
- Gagauz (gag)
- Old English (ang)
- Aragonese (an) Apertium supports some pairs, MADLAD can provide support for other source languages
- Bavarian (bar)
- Bislama (bi) OpusMT supports translations form English, MADLAD can provide support for other source languages
- Cree (cr) cr_Latn code used by MADLAD-400!
- Manx (gv)
- Inuktitut (iu)
- Mirandese (mwl)
- Nan (nan) ! zh-min-nan code used in Wikipedia. nan_Latn_TW code used by MADLAD-400!
- Low German (nds)
- Low Saxon (nds-nl) nds_NL code used by MADLAD-400!
- Ossetic (os)
- Saraiki (skr)
- Sranan Tongo (srn)
- Tuvinian (tyv)
- Venda (ve) OpusMT supports translations form English, MADLAD can provide support for other source languages
- Wu Chinese (wuu)
Moroccan Arabic (ary)Community objected to MinT using MADLAD-400 MADLAD-400 is not providing the right variant according to T339926Breton (br)Community objected to MinT using MADLAD-400 Apertium supports some pairs, MADLAD can provide support for other source languagesIdo (io)Community objected to MinT using MADLAD-400Kara-Kalpak (kaa)Community objected to MinT using MADLAD-400Cornish (kw)Community objected to MinT using MADLAD-400Madurese (mad)Community objected to MinT using MADLAD-400Nias (nia)Community objected to MinT using MADLAD-400Serbo-Croatian (sh)Community objected to MinT using MADLAD-400Simple English (simple)Community objected to MinT using MADLAD-400Talysh (tly)Community objected to MinT using MADLAD-400 tly_IR code used by MADLAD-400, but Wikipedia seems to use the latin script instead.Walloon (wa)Community objected to MinT using MADLAD-400Cantonese (yue)zh-yue code used in Wikipedia. As per T354666#9593836, MADLAD-400 has the same issues as other models not providing the right variant for the language ( T333835).Romansh (rm)Community objected to MinT using MADLAD-400Saterland Frisian (stq)Community objected to MinT using MADLAD-400Kalaallisut (kl)Community objected to MinT using MADLAD-400Southern Altai (alt)Community objected to MinT using MADLAD-400Northern Sami (se)Community objected to MinT using MADLAD-400Navajo (nv)Community objected to MinT using MADLAD-400Zhuang(za) Not supported by MADLAD-400
Steps:
- Enable all selected languages in the MinT test instance (not for Content/Section translation)
- Communicate with the communities. Inviting them to try MT quality asking whether the MT quality is useful to be available by default, as an option, or not at all.
- For languages where Content and Section Translation are not enabled by default yet, the communication can be combined as part of the plans to enable them (T353510). That is informing communities about both the enablement of Content Translation and the possibility of having MinT if the quality is good.
- Ido Wikipedia (io)
- Enable: Content and Section Translation in Ido.
- Don't enable MADLAD-400 MT support. A member of the community tested it and indicated that the quality is poor, and the translation is out of context (adds made up words or phrases not included in the source article into the machine translation).
- Low German Wikipedia (nds)
- Enable the Content and Section translation and MADLAD-400 in this Wiki; there was no response or objection to enabling it.
- Low Saxon Wikipedia(nds-nl)
- Enable the Content and Section translation and MADLAD-400 in this Wiki; there was no response or objection to enabling it.
- Mirandese Wikipedia (mwl)
- Enable the Content and Section translation and MADLAD-400 in this Wiki; there was no response or objection to enabling it.
- Simple English Wikipedia (simple)
- The community objected to enabling CX, SX and Machine translation because of the content structure permitted in Simple English Wikipedia, which would derail the project.
- Chinese (Min Nan) Wikipedia
- Enable MADLAD-400 model in this Wiki; there was no response or objection to enabling it.
- Aragonese Wikipedia (an)
- Enable MADLAD-400 model in this Wiki; there was no response or objection to enabling it.
- Tuvinian Wikipedia (tyv)
- Enable MADLAD-400 model in this Wiki; there was no response or objection to enabling it.
- Cornish Wikipedia (kw)
- The community feedback is that the translation quality is inferior. Someone rated it 3 on a scale of 1 to 10 when it comes to grammar and spelling, according to Cornish. It also adds some made-up phrases to the translation that are not in the source content. Therefore, the model should not be enabled in their Wikipedia.
- Kara-Kalpak Wikipedia (kaa)
- Do not enable the MADLAD-400 MT. A member of the community indicated that the translation is not in Kara-Kalpak language; instead, the output is Uzbek.
- Cree Wikipedia (cr)
- Enable MADLAD-400 model in this Wiki; there was no response or objection to enabling it.
- Inuktitut Wikipedia (iu)
- Enable MADLAD-400 model in this Wiki; there was no response or objection to enabling it.
- Saraiki Wikipedia (skr)
- Walloon Wikipedia (was)
- Do not enable the MADLAD-400 MT. A member of the community objected to having the machine translation because the translation model is not perfect and would give admins who are already stretched more work.
- Manx Wikipedia (gv)
- Enable MADLAD-400 model in this Wiki; there was no response or objection to enabling it.
- Ossetian/Ossetic Wikipedia (os)
- Enable MADLAD-400 model in this Wiki; there was no response or objection to enabling it.
- Serbo-Croatian Wikipedia (sh)
- Do not enable it a community member's feedback is that the translation quality is poor; It also adds some made-up phrases to the translation that are not in the source content.
- Wu Chinese Wikipedia (wuu)
- Enable MADLAD-400 model in this Wiki; there was no response or objection to enabling it.
- Sranan Tongo Wikipedia (srn)
- Enable MADLAD-400 model in this Wiki; there was no response or objection to enabling it.
- Madurese Wikipedia (mad)
- Do not enable it because a community member said that the translation in not accurate and the translation is in Indonesian language.
- Breton (be)
- Do not enable it a community member's feedback is that the translation quality is poor and not suitable as an aid.
- Talysh (tly)
- Do not enable it a community member's feedback is that the translation quality is poor and uses Arabic script instead of latin.
- Bavarian (bar)
- Enable MADLAD-400 model in this Wiki; there was no response or objection to enabling it.
- Bislama (bi)
- Enable MADLAD-400 model in this Wiki; there was no response or objection to enabling it.
- Venda (ve)
- Enable MADLAD-400 model in this Wiki; there was no response or objection to enabling it.
- Nias (nia)
- Do not enable it a community member's feedback is that the translation quality is poor and adds made up words to the translation.
A set of additional languages for which there is no Wikipedia is supported by MADLAD-400: T354675: Consider enabling MADLAD-400 in MinT for languages with no Wikipedia yet