Page MenuHomePhabricator

Enable Softcatalà models for more language pairs in MinT test instance
Open, MediumPublic

Description

The MinT test instance supports now selecting which model to use when multiple are available (T338608). Enabling more language pairs supported by the Softcatalà models (T284905) will allow communities to compare whether they work better than the current alternatives.

This ticket proposes to enable Softcatalà models for the supported language pairs (as non-default for now):

  1. German (de) ↔ Catalan (ca)
  2. French (fr) ↔ Catalan (ca)
  3. Galician (gl) ↔ Catalan (ca)
  4. Italian (it) ↔ Catalan (ca)
  5. Japanese (ja) ↔ Catalan (ca)
  6. Dutch (nl) ↔ Catalan (ca)
  7. Occitan (oc) ↔ Catalan (ca)
  8. Portuguese (pt) ↔ Catalan (ca)
  9. Spanish (es) ↔ Catalan (ca)
  10. English (en) ↔ Catalan (ca) (already enabled and default)

Event Timeline

Pginer-WMF created this task.

Change #1014431 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/machinetranslation@master] Enable Softcatalà models for more language pairs

https://gerrit.wikimedia.org/r/1014431

Change #1014431 merged by jenkins-bot:

[mediawiki/services/machinetranslation@master] Enable Softcatalà models for more language pairs

https://gerrit.wikimedia.org/r/1014431

Change #1014729 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update MinT to 2024-03-26-120044-production

https://gerrit.wikimedia.org/r/1014729

Change #1014729 merged by jenkins-bot:

[operations/deployment-charts@master] Update MinT to 2024-03-26-120044-production

https://gerrit.wikimedia.org/r/1014729

Mentioned in SAL (#wikimedia-operations) [2024-03-27T07:11:01Z] <kart_> Updated MinT to 2024-03-26-120044-production (T347930, T355304, T349487)

Pginer-WMF added a subscriber: KartikMistry.

@KartikMistry. I reviewed the enabled pairs, the status are shown in a table below with issues for Italian, Occitan, Spanish and, maybe, Japanese. The test instance went down and I could not complete testing a couple combinations (see the notes, and double check as the other issues are fixed):

Language pairScreenshotStatusNotes
German (de) ↔ Catalan (ca)
translate.wmcloud.org_(Wiki Tablet) (18).png (768×1 px, 93 KB)
French (fr) ↔ Catalan (ca)
translate.wmcloud.org_(Wiki Tablet) (10).png (768×1 px, 97 KB)
Galician (gl) ↔ Catalan (ca)
translate.wmcloud.org_(Wiki Tablet) (11).png (768×1 px, 98 KB)
Italian (it) → Catalan (ca)
translate.wmcloud.org_(Wiki Tablet) (13).png (768×1 px, 77 KB)
Translation attempts result in an error. The model is listed as softcatala-ti-ca, which seems to use ti for the Italian code while the correct code for Italian is it instead. So there may be a typo somewhere. The reverse direction for the language pair (i.e, Catalan (ca) → Italian (it) works as expected.
Japanese (ja) ↔ Catalan (ca)
translate.wmcloud.org_(Wiki Tablet) (12).png (768×1 px, 121 KB)
The models provides a translation but it often has some issues with markup (notice the "amp;apos;" in the translated content. We may want to check if this is just noise introduced by the model, or due to some processing; and decide whether a different model such as NLLB-200 may be a better default for this pair.
Dutch (nl) ↔ Catalan (ca)
translate.wmcloud.org_(Wiki Tablet) (14).png (768×1 px, 109 KB)
Occitan (oc) → Catalan (ca)
translate.wmcloud.org_(Wiki Tablet) (15).png (768×1 px, 72 KB)
Softcatalà model is not available, only NLLB-200 is listed. The reverse direction for the language pair (i.e, Catalan (ca) → Occitan (oc) works as expected.
Portuguese (pt) → Catalan (ca)
translate.wmcloud.org_(Wiki Tablet) (16).png (768×1 px, 95 KB)
The reverse direction for the language pair was not tested because the test instance went down
Spanish (es) → Catalan (ca)
translate.wmcloud.org_(Wiki Tablet) (17).png (768×1 px, 79 KB)
Softcatalà model is not available, only NLLB-200 is listed. The reverse direction for the language pair was not tested because the test instance went down