Page MenuHomePhabricator

Enable in content Translation the new languages Google Translate supports in June 2024
Closed, ResolvedPublic8 Estimated Story Points

Description

A set of new languages are now available for Google Translate. As with past enablements, it may take some time until they are available in the external APIs. Once they are available we may want to enable the Google support in Content Translation. This ticket compiles the languages to enable. Below you can find them grouped by their current support on Wikipedia:

A) Languages with Wikipedia and MT support already. We can enable the new support from Google as a non-default to provide them with another option, with no need for specific coordination:

  • Acehnese (ace)
  • Avar/Avaric (av)
  • Awadhi (awa)
  • Balinese (ban)
  • Bambara (bm)
  • Bashkir (ba)
  • Betawi (bew)
  • Breton (br)
  • Chamorro (ch)
  • Chechen (ce)
  • Chuvash (cv)
  • Dinka (din)
  • Dzongkha (dz)
  • Faroese (fo)
  • Fijian (fj)
  • Fon (fon)
  • Friulian (fur)
  • Iloko/Ilocano (ilo)
  • Jamaican Patois/Jamaican Creole English (jam)
  • Kapampangan (pam)
  • Komi (kv)
  • Konkani (gom)
  • Latgalian (ltg)
  • Ligurian (lij)
  • Limburgish (li)
  • Lombard (lmo)
  • Manx (gv)
  • Meadow/Eastern Mari (mhr)
  • Meiteilon/Manipuri (mni)
  • Minang/Minangkabau (min)
  • Nepalbhasa/Newari (new)
  • Sepedi/Northern Sotho (nso)
  • Occitan (oc)
  • Odia (or)
  • Ossetian (os)
  • Pangasinan (pag)
  • Papiamento (pap)
  • Rundi (rn)
  • Sango (sg)
  • Shan (shn)
  • Sicilian (scn)
  • Silesian (szl)
  • Swati (ss)
  • Tahitian (ty)
  • Tetum (tet)
  • Tibetan (bo)
  • Tok Pisin (tpi)
  • Tongan (to)
  • Tswana (tn)
  • Tulu (tcy)
  • Tumbuka (tum)
  • Tuvan/Tuvinian (tyv)
  • Udmurt (udm)
  • Venda (ve)
  • Venetian (vec)
  • Wolof (wo)
  • Yakut (sah)
  • Waray (war)
  • Southern Ndebele (nr)
  • Iban (iba)
  • Western Punjabi (pnb). Google Translate supports Punjabi using the Shahmukhi script with the code pa-Arab.

B) Languages with a Wikipedia but some open questions. We want to check with communities whether the MT support is useful (in bold those getting machine translation for the first time), or some other questions about the specific variant used:

  • Abkhaz/Abkhazian (ab)
  • Batak Toba (bbc)
  • Cantonese (zh-yue)
    • Enable for Cantonese, @H78c67c confirmed here that Google Translate is a helpful support for their Wikipedia.
  • Kalaallisut (kl)
  • Madurese (mad)
  • NKo (nqo)
    • No support available for Google MT for nqo.
  • Northern Sami (se)
    • Do not enable for Northern Sami. A community member stated that the quality is poor and won't be useful for their work.
  • Bikol Google uses code bik. Wikipedia uses bcl for Central Bikol, but is is unclear whether that is the variant supported by Google.
    • Enable for Central Bikol. A contributor indicated that the MT will be useful in their Wikipedia.
  • Crimean Tatar (crh). Google Translate provides translations with Cyrillic script, Crimean Tatar Wikipedia uses both Latin and Cyrillic scripts using a converter, we may want to check if Google support is useful for the community
  • Fulani/Fula (ff) This language has several varieties with several language codes, we may need to check with the community whether the variant provided by Google Translate is useful.
  • Kikongo (kg) We need to check with the community whether the variant provided by Google Translate is useful. In particular, we may want to check if they find it useful to use the translations Google provides for Kongo (kg), the ones provided for Kituba (ktu), or none of them
  • Nahuatl (nah) Google uses code nhe. We need to check with the community whether the variant provided by Google Translate is useful
  • Romani(rom) The Vlax Romani Wikipedia uses rmy code. We need to check with the community whether the variant provided by Google Translate is useful
  • Tamazight Google uses code ber and supports both Tifinagh and Latin scripts. Wikipedia uses zgh for Standard Moroccan Tamazight (using the Tifinagh script), but it is unclear whether that is the variant supported by Google.
    • Do not enable for Tamazight. A member of the community indicated that Google's variant (Kabyle) is not the same as the Amazigh language with code zgh, which is officially recognized in Morocco and used in the wiki.

C) Languages with no Wikipedia yet:

  • Acholi (ach)
  • Afar (aa) In Incubator
  • Alur (alz)
  • Baoulé (bci) In Incubator
  • Batak Karo (btx)
  • Batak Simalungun (bts)
  • Bemba (bem)
  • Chuukese (chk)
  • Dogri (doi in Google, dgo in Wikimedia) In Incubator
  • Dombe (ndq)
  • Dyula (dyu)
  • Ga (gaa) In Incubator
  • Hakha Chin (cnh) In Incubator
  • Hiligaynon (hil) In Incubator
  • Hunsrik (hrx) In Incubator
  • Jingpo (kac)
  • Kanuri (kr) In the incubator with code knc
  • Khasi (kha)
  • Kiga (cgg)
  • Kituba (ktu)
  • Kokborok (trp)
  • Krio (kri) In Incubator
  • Luo (luo) In Incubator
  • Makassar (mak)
  • Mam (mam)
  • Marshallese (mh) In Incubator
  • Marwadi (mwr in Google, rwr in Wikimedia) In Incubator
  • Mauritian Creole (mfe) In Incubator
  • Mizo (lus) In Incubator
  • Ndau (ndc)
  • Nuer (nus) In Incubator
  • Qʼeqchiʼ (kek)
  • Seychellois Creole (crs)
  • Susu (sus)
  • Tiv (tiv)
  • Yucatec Maya (yua) In Incubator
  • Zapotec (zap) In Incubator

The following languages are not enabled due to ambiguous language codes; we will enable them later:

  • Baluchi (bal) In incubator with three projects for codes bgp, bgn, and bcc
  • Buryat (bua)
  • Dari (prs) Google uses fa-AF code

D) Languages not to enable:

  • Santali (sat) Google Translate uses Latin script, and Santali Wikipedia uses Ol Chiki script.

Related: T308248: Newly supported languages in Google Translate

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
PWaigi-WMF changed the task status from Open to In Progress.Sep 2 2024, 8:20 AM
UOzurumba updated the task description. (Show Details)
UOzurumba added a subscriber: H78c67c.

The ticket captures different groups of wikis:

  • Group A was deployed. Checkmarks signal the enablement was verified, final checking is pending for some.
  • Group B where there were questions to get resolved with the community, the deployment is still pending (the conversations with the community to determine if the language support is useful should be closed by now).
  • For group C we need to decide whether it is better to enable in the config despite the lack of a wiki (so that when a wiki is created, or graduates from incubator, they have the MT support already available). I'd be in favor of adding these, otherwise we have to remember checking MT support every time a new Wikipedia is created, which can be more error-prone.
Nikerabbit updated Other Assignee, added: KartikMistry.
Nikerabbit set the point value for this task to 8.Nov 11 2024, 9:35 AM
Nikerabbit updated Other Assignee, added: UOzurumba; removed: KartikMistry.
Nikerabbit added a subscriber: UOzurumba.

I tested Google Translate with cxserver in local setup and except nqo, all other languages from Group B seem working.

"MT processing error for: en > nqo. Error: Translation with Google en > nqo failed: Translation with Google failed. Error: 400 for en > nqo

Change #1099408 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/cxserver@master] Enable support for Google MT for new languages (June 2024)

https://gerrit.wikimedia.org/r/1099408

@KartikMistry, MT is not showing for Cantonese. For example, in Section Translation:

zh-yue.m.wikipedia.org_w_index.php_title=Special_ContentTranslation&filter-type=automatic&filter-id=previous-edits&active-list=suggestions&from=en&to=yue&page=List%20of%20cheese%20dishes(Wiki Mobile).png (568×320 px, 26 KB)

MT was disabled for Cantonese in the past (T333835) but now that the Google support is confirmed to be useful (T304865#9933812), we can enable it. Please check the adjustments that were made in the previous disablement, since enabling it again may need additional changes. Thanks!

ased on the above, could you enable Google MT on Content/Section Translation for Cantonese. Currently

Change #1099408 merged by jenkins-bot:

[mediawiki/services/cxserver@master] Enable support for Google MT for 11 Wikipedias (June 2024)

https://gerrit.wikimedia.org/r/1099408

Change #1102278 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2024-12-10-132417-production

https://gerrit.wikimedia.org/r/1102278

Change #1102278 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2024-12-10-132417-production

https://gerrit.wikimedia.org/r/1102278

Mentioned in SAL (#wikimedia-operations) [2024-12-11T13:04:23Z] <kart_> Updated cxserver to 2024-12-10-132417-production (T369815)

Change #1112450 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/cxserver@master] Enable Google Translate for languages with no Wikipedias

https://gerrit.wikimedia.org/r/1112450

@KartikMistry, the new support for Cantonese from Google is not surfaced in Content Translation due to some potential config issues:
T383863: Adjust Google Configuration to expose Cantonese MT instead of Chinese

hmn (Hmong) is also supported by Google Translate but not cxserver and not listed in this task.

Also the list above include Dombe (ndq, aka Ndombe), but Google Translate supports Dombe (dov, aka Tonga/Chitonga/Zambezi). These are two languages; the former is a Bantu R language and the latter is a Bantu M language.

hmn (Hmong) is also supported by Google Translate but not cxserver and not listed in this task.

Content Translation can only work in languages in which there is a full-fledged Wikipedia. We don't have one in any Hmong language.

We do have an Incubator using code mww, and we also support it on translatewiki. The language with the code mww is probably equivalent to hmn for practical purposes, although it would be nice if someone who actually knows Hmong languages confirmed it. And if that is true, then it would be nice to enable it for translatewiki, for what it's worth.

Also the list above include Dombe (ndq, aka Ndombe), but Google Translate supports Dombe (dov, aka Tonga/Chitonga/Zambezi). These are two languages; the former is a Bantu R language and the latter is a Bantu M language.

As far as I can tell, we don't support those languages anywhere. Of course, I'll be very happy if someone who knows them asks to enable support for them.

Content Translation can only work in languages in which there is a full-fledged Wikipedia.

This task is about cxserver. See prior art at T336683: Enable MinT support for languages with no Wikipedia yet

Note languages should also be added to language-data.

Note languages should also be added to language-data.

I'm quite sure that all the necessary ones are in language-data. Am I missing anything?

We are planning to have another task to review the list of enabled languages to avoid increasing the scope of this task.

Change #1112450 merged by jenkins-bot:

[mediawiki/services/cxserver@master] Enable Google Translate for languages with no Wikipedias

https://gerrit.wikimedia.org/r/1112450

Change #1118264 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2025-02-10-050623-production

https://gerrit.wikimedia.org/r/1118264

Change #1118264 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2025-02-10-050623-production

https://gerrit.wikimedia.org/r/1118264

Note languages should also be added to language-data.

I'm quite sure that all the necessary ones are in language-data. Am I missing anything?

The following languages are not yet at language-data: alz, cgg, crs, dov, hmn, mam, kek, kha, tiv, sus, zap, ndc, chk

Change #1121305 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/cxserver@master] Enable Google MT for some more languages

https://gerrit.wikimedia.org/r/1121305

Change #1121305 merged by jenkins-bot:

[mediawiki/services/cxserver@master] Enable Google MT for some more languages

https://gerrit.wikimedia.org/r/1121305

Change #1123940 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2025-03-03-041049-production

https://gerrit.wikimedia.org/r/1123940

Change #1123940 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2025-03-03-041049-production

https://gerrit.wikimedia.org/r/1123940

Mentioned in SAL (#wikimedia-operations) [2025-03-04T05:53:43Z] <kart_> Updated cxserver to 2025-03-03-041049-production (T369815, T387037)

Change #1125963 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/cxserver@master] config: Add Google MT support for nqo

https://gerrit.wikimedia.org/r/1125963

Change #1125963 abandoned by KartikMistry:

[mediawiki/services/cxserver@master] config: Add Google MT support for nqo

Reason:

As per https://translation.googleapis.com/, there is no support for this language.

https://gerrit.wikimedia.org/r/1125963

Screen.jpg (749×1 px, 126 KB)

Last year Chechen was added to translatewiki.net. and after some time it was removed. Attached a screenshot.

In "Content Translation" on Wikipedia, the translator is still working.