Page MenuHomePhabricator

Enable in content Translation the new languages Google Translate supports in June 2024
Open, In Progress, MediumPublic8 Estimated Story Points

Description

A set of new languages are now available for Google Translate. As with past enablements, it may take some time until they are available in the external APIs. Once they are available we may want to enable the Google support in Content Translation. This ticket compiles the languages to enable. Below you can find them grouped by their current support on Wikipedia:

A) Languages with a Wikipedia and MT support already. We can enable the new support from Google as a non-default to provide them another option, with no need for specific coordination:

  1. ✅ Acehnese (ace)
  2. ✅ Avar/Avaric (av)
  3. ✅ Awadhi (awa)
  4. ✅ Balinese (ban)
  5. ✅ Bambara (bm)
  6. ✅ Bashkir (ba)
  7. ✅ Betawi (bew)
  8. ✅ Breton (br)
  9. ✅ Chamorro (ch)
  10. ✅ Chechen (ce)
  11. ✅ Chuvash (cv)
  12. ✅ Dinka (din)
  13. ✅ Dzongkha (dz)
  14. ✅ Faroese (fo)
  15. ✅ Fijian (fj)
  16. ✅ Fon (fon)
  17. ✅ Friulian (fur)
  18. ✅ Iloko/Ilocano (ilo)
  19. ✅ Jamaican Patois/Jamaican Creole English (jam)
  20. ✅ Kapampangan (pam)
  21. ✅ Komi (kv)
  22. ✅ Konkani (gom)
  23. ✅ Latgalian (ltg)
  24. ✅ Ligurian (lij)
  25. ✅ Limburgish (li)
  26. Lombard (lmo)
  27. Manx (gv)
  28. Meadow/Eastern Mari (mhr)
  29. Meiteilon/Manipuri (mni)
  30. Minang/Minangkabau (min)
  31. Nepalbhasa/Newari (new)
  32. Sepedi/Northern Sotho (nso)
  33. Occitan (oc)
  34. Odia (or)
  35. Ossetian (os)
  36. Pangasinan (pag)
  37. Papiamento (pap)
  38. Rundi (rn)
  39. Sango (sg)
  40. Shan (shn)
  41. Sicilian (scn)
  42. Silesian (szl)
  43. Swati (ss)
  44. Tahitian (ty)
  45. Tetum (tet)
  46. Tibetan (bo)
  47. Tok Pisin (tpi)
  48. Tongan (to)
  49. Tswana (tn)
  50. Tulu (tcy)
  51. Tumbuka (tum)
  52. Tuvan/Tuvinian (tyv)
  53. Udmurt (udm)
  54. Venda (ve)
  55. Venetian (vec)
  56. Western Punjabi (pnb). Google translate supports Punjabi using Shahmukhi script with the code pa-Arab.
  57. Wolof (wo)
  58. Yakut (sah)
  59. Waray (war)

Communication in progress
B) Languages with a Wikipedia but some open questions. We want to check with communities whether the MT support is useful (in bold those getting machine translation for the first time), or some other questions about the specific variant used:

  1. Abkhaz/Abkhazian (ab)
  2. Batak Toba (bbc)
  3. Cantonese (zh-yue)
    • Enable for Cantonese, @H78c67c confirmed here that the Google translate is a helpful support for their Wikipedia.
  4. Kalaallisut (kl)
  5. Madurese (mad)
  6. NKo (nqo)
  7. Northern Sami (se)
    • Do not enable for Northern Sami, a member of the community stated that the quality is poor and won't be useful for their work.
  8. Bikol Google uses code bik. Wikipedia uses bcl for Central Bikol, but is is unclear whether that is the variant supported by Google.
    • Enable for Central Bikol. A contributor indicated that the MT will be useful in their Wikipedia.
  9. Crimean Tatar (crh). Google translate provides translations with Cyrillic script, Crimean Tatar Wikipedia uses both Latin and Cyrillic scripts using a converter, we may want to check if the Google support is useful for the community
  10. Fulani/Fula (ff) This language has several varieties with several language codes, we may need to check with the community whether the variant provided by Google Translate is useful.
  11. Kikongo (kg) We need to check with the community whether the variant provided by Google Translate is useful. In particular we may want to check if they find it useful to use the translations google provides for Kongo (kg), the ones provided for Kituba (ktu), or none of them
  12. Nahuatl (nah) Google uses code nhe. We need to check with the community whether the variant provided by Google Translate is useful
  13. Romani(rom) The Vlax Romani wikipedia uses rmy code. We need to check with the community whether the variant provided by Google Translate is useful
  14. Tamazight Google uses code ber and supports both Tifinagh and Latin scripts. Wikipedia uses zgh for Standard Moroccan Tamazight (using the Tifinagh script), but is is unclear whether that is the variant supported by Google.
    • Do not enable for Tamazight. A member of the community indicated that Google's variant (Kabyle) is not the same as the Amazigh language with code zgh, which is officially recognised in Morocco and used in the wiki.

C) Languages with no Wikipedia yet:

  1. Acholi (ach)
  2. Afar (aa) In Incubator
  3. Alur (alz)
  4. Baluchi (bal) In incubator with three projects for codes bgp, bgn, and bcc
  5. Baoulé (bci) In Incubator
  6. Batak Karo (btx)
  7. Batak Simalungun (bts)
  8. Bemba (bem)
  9. Buryat (bua)
  10. Chuukese (chk)
  11. Dari (prs) Google uses fa-AF code
  12. Dogri (doi in Google, dgo in Wikimedia) In Incubator
  13. Dombe (ndq)
  14. Dyula (dyu)
  15. Ga (gaa) In Incubator
  16. Hakha Chin (cnh) In Incubator
  17. Hiligaynon (hil) In Incubator
  18. Hunsrik (hrx) In Incubator
  19. Iban (iba) In Incubator
  20. Jingpo (kac)
  21. Kanuri (kr) In incubator with code knc
  22. Khasi (kha)
  23. Kiga (cgg)
  24. Kituba (ktu)
  25. Kokborok (trp)
  26. Krio (kri) In Incubator
  27. Luo (luo) In Incubator
  28. Makassar (mak)
  29. Mam (mam)
  30. Marshallese (mh) In Incubator
  31. Marwadi (mwr in Google, rwr in Wikimedia) In Incubator
  32. Mauritian Creole (mfe) In Incubator
  33. Mizo (lus) In Incubator
  34. Ndau (ndc)
  35. Nuer (nus) In Incubator
  36. Qʼeqchiʼ (kek)
  37. Seychellois Creole (crs)
  38. Southern Ndebele (nr) In Incubator
  39. Susu (sus)
  40. Tiv (tiv)
  41. Yucatec Maya (yua) In Incubator
  42. Zapotec (zap) In Incubator

D) Languages not to enable:

  • Santali (sat) Google translate uses Latin script, Santali Wikipedia uses Ol Chiki script.

Related: T308248: Newly supported languages in Google Translate

Event Timeline

Pginer-WMF renamed this task from Enable new languages Google Translate enabled in June 2024 to Enable in content Translation the new languages Google Translate enabled in June 2024.Jul 11 2024, 12:56 PM
Pginer-WMF renamed this task from Enable in content Translation the new languages Google Translate enabled in June 2024 to Enable in content Translation the new languages Google Translate supports in June 2024.
Pginer-WMF triaged this task as Medium priority.
Pginer-WMF updated the task description. (Show Details)
Pginer-WMF updated the task description. (Show Details)

Change #1062484 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/cxserver@master] Google: Add support for 59 more languages

https://gerrit.wikimedia.org/r/1062484

The https://translation.googleapis.com/language/translate/v2/languages api to list supported languages shows all new languages. However, the actual translation fails for new languages:

{
  "error": {
    "code": 400,
    "message": "Bad language pair: en|to",
    "errors": [
      {
        "message": "Bad language pair: en|to",
        "domain": "global",
        "reason": "badRequest"
      }
    ],
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.BadRequest",
        "fieldViolations": [
          {
            "field": "target",
            "description": "Target language: to"
          }
        ]
      }
    ]
  }

Note: Buryat (bua) is a macrolanguage. Google Translate use the Russian Cyrillic variant, which we have a Wikipedia at bxr.wikipedia.org

A member of the community indicated that Google's variant (Kabyle) is not the same as the Amazigh language with code zgh, which is officially recognised in Morocco and used in the wiki.

There is a Wikipedia in Kabyle (kab.wikipedia.org).

This issue has been resolved now. API is working as expected now

The https://translation.googleapis.com/language/translate/v2/languages api to list supported languages shows all new languages. However, the actual translation fails for new languages:

A) Languages with a Wikipedia and MT support already. We can enable the new support from Google as a non-default to provide them another option, with no need for specific coordination:

In this list of 59 languages, some languages are already supported by Google(already supported in cxserver google configuration)

  1. Odia(or)
  2. Iloko/Ilocano (ilo)
  3. Konkani(gom)
  4. Meiteilon/Manipuri (mni)
  5. Sepedi/Northern Sotho (nso)

Change #1062484 merged by jenkins-bot:

[mediawiki/services/cxserver@master] Google: Add support for 54 more languages

https://gerrit.wikimedia.org/r/1062484

Change #1067221 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2024-08-27-045705-production

https://gerrit.wikimedia.org/r/1067221

Change #1067221 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2024-08-27-045705-production

https://gerrit.wikimedia.org/r/1067221

Mentioned in SAL (#wikimedia-operations) [2024-08-27T11:51:53Z] <kart_> Updated cxserver to 2024-08-27-045705-production (T369815)

PWaigi-WMF changed the task status from Open to In Progress.Sep 2 2024, 8:18 AM
PWaigi-WMF assigned this task to santhosh.
UOzurumba updated the task description. (Show Details)
UOzurumba added a subscriber: H78c67c.

The ticket captures different groups of wikis:

  • Group A was deployed. Checkmarks signal the enablement was verified, final checking is pending for some.
  • Group B where there were questions to get resolved with the community, the deployment is still pending (the conversations with the community to determine if the language support is useful should be closed by now).
  • For group C we need to decide whether it is better to enable in the config despite the lack of a wiki (so that when a wiki is created, or graduates from incubator, they have the MT support already available). I'd be in favor of adding these, otherwise we have to remember checking MT support every time a new Wikipedia is created, which can be more error-prone.
Nikerabbit updated Other Assignee, added: KartikMistry.
Nikerabbit set the point value for this task to 8.Nov 11 2024, 9:35 AM
Nikerabbit updated Other Assignee, added: UOzurumba; removed: KartikMistry.
Nikerabbit added a subscriber: UOzurumba.

I tested Google Translate with cxserver in local setup and except nqo, all other languages from Group B seem working.

"MT processing error for: en > nqo. Error: Translation with Google en > nqo failed: Translation with Google failed. Error: 400 for en > nqo

Change #1099408 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/cxserver@master] Enable support for Google MT for new languages (June 2024)

https://gerrit.wikimedia.org/r/1099408

@KartikMistry, MT is not showing for Cantonese. For example, in Section Translation:

zh-yue.m.wikipedia.org_w_index.php_title=Special_ContentTranslation&filter-type=automatic&filter-id=previous-edits&active-list=suggestions&from=en&to=yue&page=List%20of%20cheese%20dishes(Wiki Mobile).png (568×320 px, 26 KB)

MT was disabled for Cantonese in the past (T333835) but now that the Google support is confirmed to be useful (T304865#9933812), we can enable it. Please check the adjustments that were made in the previous disablement, since enabling it again may need additional changes. Thanks!

ased on the above, could you enable Google MT on Content/Section Translation for Cantonese. Currently

Change #1099408 merged by jenkins-bot:

[mediawiki/services/cxserver@master] Enable support for Google MT for 11 Wikipedias (June 2024)

https://gerrit.wikimedia.org/r/1099408

Change #1102278 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2024-12-10-132417-production

https://gerrit.wikimedia.org/r/1102278

Change #1102278 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2024-12-10-132417-production

https://gerrit.wikimedia.org/r/1102278

Mentioned in SAL (#wikimedia-operations) [2024-12-11T13:04:23Z] <kart_> Updated cxserver to 2024-12-10-132417-production (T369815)