Page MenuHomePhabricator

Enable newly supported languages by Flores in the test instance
Closed, ResolvedPublic

Description

A set of new languages are supported by Flores, as well as new language pairs for some languages previously supported. We want to enable these in the test instance of Content Translation in order to check how they work.

The languages supported are the following:

  • bak/ba (Bashkir)
  • hau/ha (Hausa)
  • ast (Asturian)
  • tso/ts (Tsonga)
  • asm/as (Assamese)
  • ckb (Sorani / Central Kurdish)
  • ilo (Iloko)
  • kon/kg (Kongo)
  • lin/ln (Lingala)
  • nso (Northern Sotho)
  • ssw/ss (Swati)
  • tsn/tn (Tswana)
  • yue/zh-yue (Cantonese)
  • orm/om (Oromo)
  • tir/ti (Tigrinya)
  • wol/wo (Wolof)
  • ayr (Central Aymara, for Aymara Wikipedia: ay))
  • isl/is (Icelandic) (new pairs)
  • zho_Hans/zh (Chinese) (new pairs)
  • ibo/ig (Igbo) (new pairs)
  • zul/zu (Zulu) (new pairs)
  • lug/lg (Luganda) (new pairs)
  • oci/oc (Occitan) (new pairs)

The specific language pairs to enable are:

  • en → {asm, ast, ayr, bak, ckb, hau, ilo, kon, lin, nso, orm, ssw, tsn, tso, yue, wol}
  • fr → {asm, ast, ayr, bak, ckb, hau, ibo, ilo, isl, kon, lin, lug, nso, oci, orm, ssw, tir, tsn, tso, wol, yue, zho_Hans, zul}
  • es → {asm, ast, ayr, bak, ckb, hau, ibo, ilo, isl, kon, lin, lug, nso, oci, orm, ssw, tir, tsn, tso, yue, zho_Hans, zul}
  • {cat, por} → oci
  • zho_Hans → yue
  • rus → bak

Note that:

  • The languages listed on top use both 3-letter and 2-letter iso-codes when the corresponding Wikipedia uses the 2-letter ones. The language pairs are listed using the 3-letter codes. Still, we need to determine how to capture them in the configuration to avoid a mismatch between the translation service and Wikipedia codes.
  • For Bashkir and Hausa we want to keep the current default services. So Flores will not be the default initially for those.

Event Timeline

Test instances (language-cx: https://language-cx.wmcloud.org/ language-cx-ofb: https://language-cx-ofb.wmcloud.org/ and sx: https://sx.wmflabs.org/) now have the following Flores pairs enabled for testing.

The list can be verified at: https://cxserver.wmflabs.org/v2?doc#/Tools/get_v1_list__tool_

"Flores": {
  "en": [
    "as",
    "ast",
    "ayr",
    "ba",
    "ckb",
    "ha",
    "ig",
    "ilo",
    "is",
    "kg",
    "ln",
    "lg",
    "nso",
    "oc",
    "om",
    "ss",
    "tn",
    "wo",
    "yue",
    "zh",
    "zu"
  ],
  "ca": [
    "oc"
  ],
  "es": [
    "as",
    "ast",
    "ayr",
    "ba",
    "ckb",
    "ha",
    "ig",
    "ilo",
    "is",
    "kg",
    "ln",
    "lg",
    "nso",
    "oc",
    "om",
    "ss",
    "ti",
    "tn",
    "yue",
    "zh",
    "zu"
  ],
  "fr": [
    "as",
    "ast",
    "ayr",
    "ba",
    "ckb",
    "ha",
    "ig",
    "ilo",
    "is",
    "kg",
    "ln",
    "lg",
    "nso",
    "oc",
    "om",
    "ss",
    "ti",
    "tn",
    "wo",
    "yue",
    "zh",
    "zu"
  ],
  "pt": [
    "oc"
  ],
  "ru": [
    "ba"
  ],
  "zh": [
    "yue"
  ]
},

Thanks Kartik. I tried this in the test environment and seems to work well in most cases.
Some aspects to check and adjust:

  • Support for Aymara. Wikipedia is available in Aymara (ay), but Flores supports supports Central Aymara (ayr). With the current configu, translaitng to Aymara Wikipedia shows no MT service listed. We may need to adjust the configuration so that the "ay" wikipedia is supported with an "ayr" MT from Flores. This may require set-up on our side similar to the way we exposed MT of closely related languages in T258919.
  • Support for Tsonga. When translating to Tsonga, no MT service is shown:
    • Flores should be shown as default for {en, fr, es} → ts
    • Opus should be listed as optional for en → ts.
  • Keep existing defaults. Flores appears ads the default for some languages where it should be exposed as optional to avoid changing the experience of translators suddenly. This may be just the effect of the test server (where Google and Yandex are not functional), but listing these to double check:
    • Bashkir should have Yandex as default: {en, fr, es, ru}→ ba
    • Hausa should have Google as default: {en, fr, es}→ ha
    • Chinese should have Google as default: {fr, es} → zh

Thanks for extensive testing, Pau!

  • Support for Aymara. Wikipedia is available in Aymara (ay), but Flores supports supports Central Aymara (ayr). With the current configu, translaitng to Aymara Wikipedia shows no MT service listed. We may need to adjust the configuration so that the "ay" wikipedia is supported with an "ayr" MT from Flores. This may require set-up on our side similar to the way we exposed MT of closely related languages in T258919.

ay is now fixed. We need to use 'ay' code and Flores seems to support it as well.

  • Support for Tsonga. When translating to Tsonga, no MT service is shown:
    • Flores should be shown as default for {en, fr, es} → ts
    • Opus should be listed as optional for en → ts.

Fixed.

  • Keep existing defaults. Flores appears as the default for some languages where it should be exposed as optional to avoid changing the experience of translators suddenly. This may be just the effect of the test server (where Google and Yandex are not functional), but listing these to double check:
    • Bashkir should have Yandex as default: {en, fr, es, ru}→ ba
    • Hausa should have Google as default: {en, fr, es}→ ha
    • Chinese should have Google as default: {fr, es} → zh

For all default setup in Production, we are using config/mt-defaults.wikimedia.yaml. I'll submit the above changes (where it is missing) in cxserver as a next step.

Change 801371 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/cxserver@master] Update default MT for some language pairs

https://gerrit.wikimedia.org/r/801371

Change 801371 merged by jenkins-bot:

[mediawiki/services/cxserver@master] Update default MT for some language pairs

https://gerrit.wikimedia.org/r/801371

Change 801663 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2022-05-31-111430-production

https://gerrit.wikimedia.org/r/801663

Change 801663 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2022-05-31-123738-production

https://gerrit.wikimedia.org/r/801663

Change 803869 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2022-05-31-045829-production

https://gerrit.wikimedia.org/r/803869

Change 803869 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2022-05-31-045829-production

https://gerrit.wikimedia.org/r/803869