Page MenuHomePhabricator

Deploy Flores Machine Translation in a new set of Languages
Open, In Progress, MediumPublic2 Estimated Story Points

Description

FLORES MT has been deployed to Igbo, Zulu, Icelandic, Luganda, and Occitan Wikipedias. This ticket proposes to have FLORES MT in a few more languages:

  • bak/ba (Bashkir)
  • hau/ha (Hausa)
  • ast (Asturian)
  • tso/ts (Tsonga)
  • asm/as (Assamese)
  • ckb (Sorani / Central Kurdish)
  • ilo (Iloko)
  • kon/kg (Kongo)
  • lin/ln (Lingala)
  • nso (Northern Sotho)
  • ssw/ss (Swati)
  • tsn/tn (Tswana)
  • yue/zh-yue (Cantonese)
  • orm/om (Oromo)
  • tir/ti (Tigrinya)
  • wol/wo (Wolof)
  • ayr (Central Aymara, for Aymara Wikipedia: ay))
  • isl/is (Icelandic) (new pairs)
  • zho_Hans/zh (Chinese) (new pairs)
  • ibo/ig (Igbo) (new pairs)
  • zul/zu (Zulu) (new pairs)
  • lug/lg (Luganda) (new pairs)
  • oci/oc (Occitan) (new pairs)

The specific language pairs to enable are:

  • en → {asm, ast, ayr, bak, ckb, hau, ilo, kon, lin, nso, orm, ssw, tsn, tso, yue, wol}
  • fr → {asm, ast, ayr, bak, ckb, hau, ibo, ilo, isl, kon, lin, lug, nso, oci, orm, ssw, tir, tsn, tso, wol, yue, zho_Hans, zul}
  • es → {asm, ast, ayr, bak, ckb, hau, ibo, ilo, isl, kon, lin, lug, nso, oci, orm, ssw, tir, tsn, tso, yue, zho_Hans, zul}
  • {cat, por} → oci
  • zho_Hans → yue
  • rus → bak

Note that:

  • The languages listed on top use both 3-letter and 2-letter iso-codes when the corresponding Wikipedia is using the 2-letter ones. The language pairs are listed using the 3-letter codes, but we need to determine how to capture them in the configuration to avoid a mismatch between the translation service and Wikipedia codes.
  • For Bashkir and Hausa we want to keep the current default services. So Flores will not be the default initially for those.

Steps:

  • Enable in test instance (T309173)
  • Integration work
  • Deployment

Once this process is completed, we can consider making Content/Section Translation more visible (T309384)

Event Timeline

Pginer-WMF renamed this task from Deploy Flores Machine Translation in 16 Languages to Deploy Flores Machine Translation in a new set of Languages.May 26 2022, 10:54 AM
Pginer-WMF updated the task description. (Show Details)
Pginer-WMF updated the task description. (Show Details)
Pginer-WMF updated the task description. (Show Details)
KartikMistry changed the task status from Open to In Progress.Wed, Jun 15, 2:04 PM
KartikMistry claimed this task.
KartikMistry updated Other Assignee, added: UOzurumba.

Change 805825 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/cxserver@master] config: Deploy Flores Machine Translation in a new set of Languages

https://gerrit.wikimedia.org/r/805825

KartikMistry set the point value for this task to 2.Thu, Jun 16, 11:18 AM

@UOzurumba Can you update the status of the community communication in the above checkboxes in the task description?

Change 805825 merged by jenkins-bot:

[mediawiki/services/cxserver@master] config: Deploy Flores Machine Translation in a new set of Languages

https://gerrit.wikimedia.org/r/805825

Change 806970 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2022-06-21-035954-production

https://gerrit.wikimedia.org/r/806970

Hello @KartikMistry, I have concluded communications with the Sorani / Central Kurdish Wikipedia; they did not object to having the FLORES MT. Please add the ckbWiki to the deployment, or the next one if you have already done the needful. Thanks!

Change 806970 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2022-06-21-035954-production

https://gerrit.wikimedia.org/r/806970

Mentioned in SAL (#wikimedia-operations) [2022-06-21T10:51:56Z] <kart_> Updated cxserver to 2022-06-21-035954-production (T307970)

Hello @KartikMistry, I have concluded communications with the Sorani / Central Kurdish Wikipedia; they did not object to having the FLORES MT. Please add the ckbWiki to the deployment, or the next one if you have already done the needful. Thanks!

Thanks for follow-up! Note that currently, Flores is the default for ckb. If the community wants to change this to 'source', ie no default MT, we can adjust it.

Hello @KartikMistry, I have concluded communications with the Sorani / Central Kurdish Wikipedia; they did not object to having the FLORES MT. Please add the ckbWiki to the deployment, or the next one if you have already done the needful. Thanks!

Thanks for follow-up! Note that currently, Flores is the default for ckb. If the community wants to change this to 'source', ie no default MT, we can adjust it.

Thanks!