Page MenuHomePhabricator

Enable Content and Section translation on some Wikipedias with potential to be supported with MinT using MADLAD-400
Closed, ResolvedPublic

Description

All Wikipedias identified as part of the Boost initiative already have Section Translation available. This ticket proposes to enable the tool on other Wikipedias as well as enable Content Translation as a default tool to facilitate translation on both desktop and mobile devices.

This ticket proposes to enable translation tools in a set of languages that are supported by the MADLAD-400 open source translation model. We are exploring possible integration of this model in the future, and having the translation tools available in the supported wikis by default seems helpful to maximize the impact when machine translation capabilities are enabled.

As part of this process, we'll generate the template parameter alignments (T221211) for these languages, enable access to the tool in test wiki for editors to try, communicate with the different communities and only proceed with the enablement if there is no major concern.

These are the languages selected for this task:

  1. Arpitan (frp)
  2. Kabardian (kbd)
  3. Moksha (mdf)
  4. Gorontalo (gor)
  5. Avar (av)
  6. Komi-Permyak (koi)
  7. Chechen (ce)
  8. Erzya (myv)
  9. Adyghe (ady)
  10. Newari (new)
  11. Kalmyk (xal)
  12. Jamaican Creole English (jam)
  13. Mon (mnw)
  14. Fiji Hindi (hif)
  15. Komi (kv)
  16. Tulu (tcy)
  17. Pampanga (pam)
  18. Tetum (tet)
  19. Karachay-Balkar (krc)
  20. Chamorro (ch)
  21. Gagauz (gag)
  22. Old English (ang)
  23. Romansh (rm) Objected to MinT. Enable only Content/Section translation
  24. Saterland Frisian (stq) Objected to MinT. Enable only Content/Section translation
  25. Kalaallisut (kl) Objected to MinT. Enable only Content/Section translation
  26. Southern Altai (alt) Objected to MinT. Enable only Content/Section translation
  27. Northern Sami (se) Objected to MinT. Enable only Content/Section translation
  28. Zhuang (za) MADLAD-400 not available. Enable only Content/Section translation
  29. Navajo (nv) Objected to MinT and Content/Section Translation

Steps:

  • Generate template parameter alignments (T221211) (See: T358645)
  • Enable selected wikis on Test Wikipedia
  • Enable all selected languages in the MinT test instance (not for Content/Section translation) as part of T354666
  • Communicate with the communities. Inviting them to try MT quality asking whether the MT quality is useful to be available by default, as an option, or not at all.
    • Arpitan (frp)
    • Romansh (rm)
      • The community objected to having MADLAB-400 because of the quality. However, they did not object to enabling the Content and Section translation tool by default. Only enable Content and Section translation by default in this wiki.
    • Kabardian (kbd)
    • Moksha (mdf)
    • Gorontalo (gor)
    • Avar (av)
    • Komi-Permyak (koi)
    • Saterland Frisian (stq)
      • The community objected to having MADLAB-400 because of the quality. However, they did not object to enabling the Content and Section translation tool by default. Only enable Content and Section translation by default in this wiki.
    • Chechen (ce)
    • Kalaallisut (kl)
      • The community objected to having MADLAB-400 because of the quality. However, they did not object to enabling the Content and Section translation tool by default. Only enable Content and Section translation by default in this wiki.
    • Erzya (myv)
    • Adyghe (ady)
    • Newari (new)
    • Southern Altai (alt)
      • The community objected to having MADLAB-400 because of the quality. However, they did not object to enabling the Content and Section translation tool by default. Only enable Content and Section translation by default in this wiki.
    • Kalmyk (xal)
    • Jamaican Creole English (jam)
    • Mon (mnw)
    • Fiji Hindi (hif)
    • Komi (kv)
    • Tulu (tcy)
    • Pampanga (pam)
    • Tetum (tet)
    • Zhuang (za)
    • Karachay-Balkar (krc)
    • Chamorro (ch)
    • Gagauz (gag)
    • Old English (ang)
    • Northern Sami (see)
      • The community objected to having MADLAB-400 because of the quality. However, they did not object to enabling the Content and Section translation tool by default. Only enable Content and Section translation by default in this wiki.
    • Navajo (nv)
      • There was an objection to having the MADLAD-400 Machine translation model as well as having Content and Section translation enabled by default. See discussions for details.
  • Enable Content and Section translation in the corresponding wikis
  • Enable MinT in Content/Section translation based on the feedback.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 988493 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/mediawiki-config@master] testwiki: Enable Section translation on WPs with potential to be supported with MinT using MADLAD-400

https://gerrit.wikimedia.org/r/988493

Change 988493 merged by jenkins-bot:

[operations/mediawiki-config@master] testwiki: Enable Section translation on WPs with potential to be supported with MinT using MADLAD-400

https://gerrit.wikimedia.org/r/988493

Mentioned in SAL (#wikimedia-operations) [2024-01-09T08:06:29Z] <kartik@deploy2002> Started scap: Backport for [[gerrit:988493|testwiki: Enable Section translation on WPs with potential to be supported with MinT using MADLAD-400 (T353510)]]

Mentioned in SAL (#wikimedia-operations) [2024-01-09T08:10:35Z] <kartik@deploy2002> kartik: Backport for [[gerrit:988493|testwiki: Enable Section translation on WPs with potential to be supported with MinT using MADLAD-400 (T353510)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-01-09T08:22:23Z] <kartik@deploy2002> Finished scap: Backport for [[gerrit:988493|testwiki: Enable Section translation on WPs with potential to be supported with MinT using MADLAD-400 (T353510)]] (duration: 15m 54s)

@UOzurumba We've enabled Section Translation for the above Wikipedias in testwiki. You can notify communities to test Section Translation and provide feedback.

Change 989127 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/machinetranslation@master] Introduce MADLAD-400 model

https://gerrit.wikimedia.org/r/989127

@KartikMistry based on the planning meeting discussions. I updated the description of the task:

  • More languages. I reviewed the supported languages and found 22 additional languages that can supported. Since those have Content and Section Translation enabled. The enablement of MADLAD-400 in MinT test instance for all languages not supported by others is captured in T354666: Enable MADLAD-400 in MinT test instance for Wikipedia languages not supported by other services
  • MinT enablement. I added a step to enable MinT support in the test instance. In this way, the communication about the enablement of Content and Section Translation can be used to get input from the community about translation quality and decide whether to provide support for it in Content Translation.

@UOzurumba We've enabled Section Translation for the above Wikipedias in testwiki. You can notify communities to test Section Translation and provide feedback.

@UOzurumba we may want to wait for MADLAD-400 to be available in the MinT test instance (T354666) before communicating with the communities. In this way we can ping them once for both (a) informing about the plans to enable Content and Section translation by default (unless they have major concerns), and (b) ask them to check MADLAD-400 quality for their language and provide feedback about whether it is good as a default or optional translation service, or translation quality is so low that it is better to keep it disabled for their language.

@UOzurumba We've enabled Section Translation for the above Wikipedias in testwiki. You can notify communities to test Section Translation and provide feedback.

@UOzurumba we may want to wait for MADLAD-400 to be available in the MinT test instance (T354666) before communicating with the communities. In this way we can ping them once for both (a) informing about the plans to enable Content and Section translation by default (unless they have major concerns), and (b) ask them to check MADLAD-400 quality for their language and provide feedback about whether it is good as a default or optional translation service, or translation quality is so low that it is better to keep it disabled for their language.

@Pginer-WMF Thank you for the information. I will wait MADLAD-400 to start the communication.

Nikerabbit changed the task status from Open to In Progress.Jan 15 2024, 12:27 PM

Change 989127 merged by jenkins-bot:

[mediawiki/services/machinetranslation@master] Introduce MADLAD-400 model

https://gerrit.wikimedia.org/r/989127

Change 991578 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update MinT to 2024-01-18-051410-production

https://gerrit.wikimedia.org/r/991578

Change 991578 merged by jenkins-bot:

[operations/deployment-charts@master] Update MinT to 2024-01-22-053144-production

https://gerrit.wikimedia.org/r/991578

@UOzurumba We've enabled Section Translation for the above Wikipedias in testwiki. You can notify communities to test Section Translation and provide feedback.

@UOzurumba we may want to wait for MADLAD-400 to be available in the MinT test instance (T354666) before communicating with the communities. In this way we can ping them once for both (a) informing about the plans to enable Content and Section translation by default (unless they have major concerns), and (b) ask them to check MADLAD-400 quality for their language and provide feedback about whether it is good as a default or optional translation service, or translation quality is so low that it is better to keep it disabled for their language.

@Pginer-WMF Thank you for the information. I will wait MADLAD-400 to start the communication.

MADLAD-400 is already available at the test instance for the languages above. For these communities the idea is to:

  • Inform we plan to enable Content and Section translation by default, and check if there is any major concern from their side.
  • Invite them to try MT quality of MinT for their language using the new MADLAD-400 model and asking whether the MT quality is useful to be available (a) by default, (b) as an option, or (c) not at all.

One aspect to include in the communication is that MADLAD-400 may be quite slow in the test instance, but it is expected to be much faster when used inside Content Translation. So we encourage people to make their evaluation based on the translation quality rather than speed.

@UOzurumba We've enabled Section Translation for the above Wikipedias in testwiki. You can notify communities to test Section Translation and provide feedback.

@UOzurumba we may want to wait for MADLAD-400 to be available in the MinT test instance (T354666) before communicating with the communities. In this way we can ping them once for both (a) informing about the plans to enable Content and Section translation by default (unless they have major concerns), and (b) ask them to check MADLAD-400 quality for their language and provide feedback about whether it is good as a default or optional translation service, or translation quality is so low that it is better to keep it disabled for their language.

@Pginer-WMF Thank you for the information. I will wait MADLAD-400 to start the communication.

MADLAD-400 is already available at the test instance for the languages above. For these communities the idea is to:

  • Inform we plan to enable Content and Section translation by default, and check if there is any major concern from their side.
  • Invite them to try MT quality of MinT for their language using the new MADLAD-400 model and asking whether the MT quality is useful to be available (a) by default, (b) as an option, or (c) not at all.

I will start the communication this week. Thanks.

Change 1010226 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/mediawiki-config@master] Enable Content/Section translation on some Wikipedias

https://gerrit.wikimedia.org/r/1010226

KartikMistry updated the task description. (Show Details)
KartikMistry updated the task description. (Show Details)

Change 1010350 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/machinetranslation@master] config: Distable MADLAD-400 based on community feedback

https://gerrit.wikimedia.org/r/1010350

Change 1010350 merged by jenkins-bot:

[mediawiki/services/machinetranslation@master] config: Distable MADLAD-400 based on community feedback

https://gerrit.wikimedia.org/r/1010350

Change rGCIT10108796b2e4 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/cxserver@master] config: Enable MinT for languages supported by MADLAD-400

https://gerrit.wikimedia.org/r/1010879

Change rGCIT10108796b2e4 merged by jenkins-bot:

[mediawiki/services/cxserver@master] config: Enable MinT for languages supported by MADLAD-400

https://gerrit.wikimedia.org/r/1010879

Change 1012364 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2024-03-18-111401-production

https://gerrit.wikimedia.org/r/1012364

Change 1012364 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2024-03-18-111401-production

https://gerrit.wikimedia.org/r/1012364

Mentioned in SAL (#wikimedia-operations) [2024-03-20T06:08:55Z] <kart_> Updated cxserver to 2024-03-18-111401-production (T353510)

Change 1010226 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable Content/Section translation on some Wikipedias

https://gerrit.wikimedia.org/r/1010226

Mentioned in SAL (#wikimedia-operations) [2024-03-20T08:04:44Z] <kartik@deploy2002> Started scap: Backport for [[gerrit:1010226|Enable Content/Section translation on some Wikipedias (T353510)]]

Mentioned in SAL (#wikimedia-operations) [2024-03-20T08:07:13Z] <kartik@deploy2002> kartik: Backport for [[gerrit:1010226|Enable Content/Section translation on some Wikipedias (T353510)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-03-20T08:21:51Z] <kartik@deploy2002> Finished scap: Backport for [[gerrit:1010226|Enable Content/Section translation on some Wikipedias (T353510)]] (duration: 17m 06s)

Thanks for all the updates. I reviewed the list and noticed a couple of cases to review:

  • Erzya (myv) does not see to have Content translation enabled by default (is still listed as a beta feature ). Section Translation seems to be enabled (accessible when the beta feature is on).
  • Zhuang (za) shows the MT unavailable error. So maybe there is some issue with MinT configurations on this language (screenshot below)

Screenshot 2024-03-20 at 14.53.35 2.png (636×939 px, 262 KB)

Change 1013084 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/mediawiki-config@master] Enable ContentTranslation by default for myvwiki

https://gerrit.wikimedia.org/r/1013084

@Pginer-WMF Thanks! myvwiki was a mistake, fixing with the next deployment. For za, it was removed because it isn't supported by MADLAD-400 (See: T354666)

@Pginer-WMF Thanks! myvwiki was a mistake, fixing with the next deployment.

Thanks!

For za, it was removed because it isn't supported by MADLAD-400 (See: T354666)

Ok. In such case, I'd expect Section Translation not to list MinT as an option. Instead of show the MinT option with an error. There may be something to adjust on the Content/Section translation side of the configuration.

For za, it was removed because it isn't supported by MADLAD-400 (See: T354666)

Ok. In such case, I'd expect Section Translation not to list MinT as an option. Instead of show the MinT option with an error. There may be something to adjust on the Content/Section translation side of the configuration.

Yes. I'll fix that too.

Change 1013271 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/cxserver@master] config: Remove za (Zhuang), not supported by MinT

https://gerrit.wikimedia.org/r/1013271

Change 1013271 merged by jenkins-bot:

[mediawiki/services/cxserver@master] config: Remove za (Zhuang), not supported by MinT

https://gerrit.wikimedia.org/r/1013271

Change 1013273 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2024-03-21-114859-production

https://gerrit.wikimedia.org/r/1013273

Pginer-WMF updated the task description. (Show Details)
Pginer-WMF updated the task description. (Show Details)

Change #1013084 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable ContentTranslation by default for myvwiki

https://gerrit.wikimedia.org/r/1013084

Mentioned in SAL (#wikimedia-operations) [2024-03-26T08:03:57Z] <kartik@deploy1002> Started scap: Backport for [[gerrit:1013084|Enable ContentTranslation by default for myvwiki (T353510)]]

Mentioned in SAL (#wikimedia-operations) [2024-03-26T08:06:33Z] <kartik@deploy1002> kartik: Backport for [[gerrit:1013084|Enable ContentTranslation by default for myvwiki (T353510)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-03-26T08:19:46Z] <kartik@deploy1002> Finished scap: Backport for [[gerrit:1013084|Enable ContentTranslation by default for myvwiki (T353510)]] (duration: 15m 48s)

Change #1013273 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2024-03-21-114859-production

https://gerrit.wikimedia.org/r/1013273

@Pginer-WMF Both issues have been fixed. Feel free to move this task.

Mentioned in SAL (#wikimedia-operations) [2024-03-26T08:41:30Z] <kart_> Updated cxserver to 2024-03-21-114859-production (T353510)

@Pginer-WMF Both issues have been fixed. Feel free to move this task.

Great. Thanks!