Page MenuHomePhabricator

Enable Opus models for languages lacking other Machine Translation options
Open, In Progress, MediumPublic

Description

The Opus project provides translation models for many languages. This task identifies languages not supported by other translation services, not even those potentially supported by NLLB-200 (T326578). In addition, it also includes languages for which feedback suggests that Opus could significantly improve the translation quality compared to other options available.

This ticket proposes to provide support for the following languages (and specific pairs):

LanguagePairBLEUStatusNotes
Central Bikol (bcl)en – bcl31.9Enabled as part of T331836: Support multiple MT models in self hosted machine translation service.
tl - bclModel not available
Cantonese (zh-yue)zh – zh-yueModel not available NLLB-200 support is not valid for the language based on T333835: Disable machine translation for Cantonese. We may want to check whether Opus support is useful.
en – zh-yueModel not available.
Moroccan arabic (ary)en – aryModel not available NLLB-200 support is not valid for the language based on T339926: The NLLB-200 MT engine in MinT returns standard Arabic translation instead of Moroccan Darija in Moroccan Arabic Wikipedia. We may want to check whether Opus support is useful.
ar – aryModel not available.
Gun (guw)en – guw45.7
Cherokee (chr)en – chr44.6
Sranan Tongo (srn)en – srn34.6
Venda (ve)en – ve40.5
Tahitian (ty)en – ty46.8
fr – ty39.6
Bislama (bi)en – bi37.1
th – biModel not available
Tongan (to)en – to59.1
Manx (gv)pt – gvModel not available
en – gv70.1Poor quality
Walloon (wa)en - wa33.4Sentence segmenter using old file format; Not possible to use this model as of now.
fr - waModel not available
Western Frisian (fy)nl - fyModel not available
en - fyModel not available
Breton (br)en - fr - brAvailable now.
fr - brModel not available
Finnish (fi)en - fi25.7Supported by other services already, but feedback suggests Opus may improve quality). Low BLEU score.
sv - fi45.2

Details

Other Assignee
santhosh
SubjectRepoBranchLines +/-
operations/deployment-chartsmaster+1 -1
operations/deployment-chartsmaster+1 -1
mediawiki/services/cxservermaster+5 -0
mediawiki/services/machinetranslationmaster+6 -1
operations/deployment-chartsmaster+1 -1
operations/deployment-chartsmaster+1 -1
mediawiki/services/machinetranslationmaster+12 -2
mediawiki/services/cxservermaster+2 -2
operations/deployment-chartsmaster+1 -1
mediawiki/services/cxservermaster+3 -0
operations/deployment-chartsmaster+1 -1
mediawiki/services/machinetranslationmaster+23 -10
mediawiki/services/cxservermaster+2 -0
operations/deployment-chartsmaster+1 -1
mediawiki/services/machinetranslationmaster+6 -1
mediawiki/services/machinetranslationmaster+6 -1
operations/deployment-chartsmaster+1 -1
mediawiki/services/cxservermaster+2 -0
operations/deployment-chartsmaster+1 -1
mediawiki/services/machinetranslationmaster+13 -3
Show related patches Customize query in gerrit

Related Objects

Mentioned In
T363263: Post-creation work for iglwiki
T360310: Post-creation work for bewwiki
T360303: Post-creation work for kuswiki
T354666: Enable MADLAD-400 in MinT test instance for Wikipedia languages not supported by other services
T352747: Google is not listed as an option for Norwegian
T355686: Configure mesh listeners to allow IPv6 localhost (::) as well as IPv4 (127.0.0.1)
T255568: Envoy should listen on ipv6 and ipv4
T350373: Post-creation work for bbcwiki
T350241: Post-creation work for zghwiki
T350229: Post-creation work for dgawiki
T349079: Test instance for MinT keeps loading forever in some translations
T340507: Create a language detection service in LiftWing
T349991: MinT: Exception on /api/translate/nn/ff [POST]
T336683: Enable MinT support for languages with no Wikipedia yet
T348097: Twi is listed as Akan in the MinT translation interface
T338602: Make MinT the default service for Zulu in Content Translation
T343340: Identify most useful languages for future OpusMT models
T343211: Enable Content and Section translation on 7 Wikipedias
T341335: MinT not working for Latvian in Content & Section Translation
T338606: Re-run the MT service usage report after MinT is made available to a broad set of languages
T341050: Analyze activity levels for communities supported only by MinT
T340997: Design exploration for the consumption of machine translated sections of Wikipedia articles
T340953: Enable MinT for all the remaining languages supported by NLLB-200
T335491: Provide better long-term storage for translation models
T331505: Self hosted machine translation service
T304865: Enable Content and Section Translation for Cantonese Wikipedia
Mentioned Here
T360303: Post-creation work for kuswiki
T360310: Post-creation work for bewwiki
T363263: Post-creation work for iglwiki
T354666: Enable MADLAD-400 in MinT test instance for Wikipedia languages not supported by other services
T255568: Envoy should listen on ipv6 and ipv4
T352747: Google is not listed as an option for Norwegian
T355686: Configure mesh listeners to allow IPv6 localhost (::) as well as IPv4 (127.0.0.1)
T350229: Post-creation work for dgawiki
T350241: Post-creation work for zghwiki
T350373: Post-creation work for bbcwiki
T340507: Create a language detection service in LiftWing
T349079: Test instance for MinT keeps loading forever in some translations
T349991: MinT: Exception on /api/translate/nn/ff [POST]
T336683: Enable MinT support for languages with no Wikipedia yet
T348097: Twi is listed as Akan in the MinT translation interface
T339926: The NLLB-200 MT engine in MinT returns standard Arabic translation instead of Moroccan Darija in Moroccan Arabic Wikipedia
T338602: Make MinT the default service for Zulu in Content Translation
T343211: Enable Content and Section translation on 7 Wikipedias
T341335: MinT not working for Latvian in Content & Section Translation
T331836: Support multiple MT models in self hosted machine translation service
T326578: Enable NLLB-200 for languages lacking other Machine Translation options
T333835: Disable machine translation for Cantonese

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 965934 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/machinetranslation@master] opusmt: Add en-guw model

https://gerrit.wikimedia.org/r/965934

Change 965934 merged by jenkins-bot:

[mediawiki/services/machinetranslation@master] opusmt: Add en-guw model

https://gerrit.wikimedia.org/r/965934

Change 966030 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/machinetranslation@master] opusmt: Add en-ty (Tahitian) model

https://gerrit.wikimedia.org/r/966030

Change 966030 merged by jenkins-bot:

[mediawiki/services/machinetranslation@master] opusmt: Add en-ty (Tahitian) model

https://gerrit.wikimedia.org/r/966030

Change 966170 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update MinT to 2023-10-16-101614-production

https://gerrit.wikimedia.org/r/966170

Change 966170 merged by jenkins-bot:

[operations/deployment-charts@master] Update MinT to 2023-10-16-101614-production

https://gerrit.wikimedia.org/r/966170

Change 966326 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/cxserver@master] MinT: Add en->guw and en-ty pairs

https://gerrit.wikimedia.org/r/966326

Mentioned in SAL (#wikimedia-operations) [2023-10-17T05:59:44Z] <kart_> Update MinT to 2023-10-16-101614-production (T333969, T336683, T348097)

Change 966326 merged by jenkins-bot:

[mediawiki/services/cxserver@master] MinT: Add en->guw and en-ty pairs

https://gerrit.wikimedia.org/r/966326

Change 966326 merged by jenkins-bot:

[mediawiki/services/cxserver@master] MinT: Add en->guw and en-ty pairs

https://gerrit.wikimedia.org/r/966326

We may want to enable these pairs also when translating from Simple English Wikipedia. Currently guw and ty are not listed

Change 968687 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/machinetranslation@master] opusmt: Add en-ve (Venda) model

https://gerrit.wikimedia.org/r/968687

Change 969528 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/cxserver@master] MinT: Add en->{bi, srn, ve} OpusMT pairs

https://gerrit.wikimedia.org/r/969528

Change 968687 merged by jenkins-bot:

[mediawiki/services/machinetranslation@master] opusmt: Add new models: en-bi, en-srn and en-ve

https://gerrit.wikimedia.org/r/968687

Change 968388 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update MinT to 2023-10-31-044726-production

https://gerrit.wikimedia.org/r/968388

Change 968388 merged by jenkins-bot:

[operations/deployment-charts@master] Update MinT to 2023-10-31-044726-production

https://gerrit.wikimedia.org/r/968388

Change 969528 merged by jenkins-bot:

[mediawiki/services/cxserver@master] MinT: Add en->{bi, srn, ve} OpusMT pairs

https://gerrit.wikimedia.org/r/969528

Change 971633 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2023-11-06-060744-production

https://gerrit.wikimedia.org/r/971633

Change 971633 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2023-11-06-060744-production

https://gerrit.wikimedia.org/r/971633

@KartikMistry Would the models listed in the pages below be a viable alternative to support these languages below?

Change 1002968 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/machinetranslation@master] opusmt: Add new model: sv-fi

https://gerrit.wikimedia.org/r/1002968

Change 1003395 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/machinetranslation@master] opusmt: Add new model: French-Tahitian (fr-ty)

https://gerrit.wikimedia.org/r/1003395

I was checking whether the above language pairs are available in Content Translation based on the info from this config file and there seem to be some inconsistencies. While most of the above languages (bcl, guw, srn, ve, bi) seem to be properly set-up there are some which may need adjustments:

  • chr is currently listed as supporting all language pair combinations. Instead, it should be listed for the specific en->chr pair (since it seems to be supported only by this OpusMT model)
  • to is currently listed as supporting all language pair combinations. Instead, it should be listed for the specific en->to pair (since it seems to be supported only by this OpusMT model)
  • ty is only listed for the en->ty pair. The recent fr->ty and ty->fr pairs should be added too.

Please, double check the above assumptions about language pairs support before making config changes.

Change 1003451 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/cxserver@master] config: Fix OpusMT language codes

https://gerrit.wikimedia.org/r/1003451

.. I was checking whether the above language pairs are available in Content Translation based on the info from this config file and there seem to be some inconsistencies. While most of the above languages (bcl, guw, srn, ve, bi) seem to be properly set-up >
Please, double check the above assumptions about language pairs support before making config changes.

Thanks, Pau!

I've submitted a fix for chr and to. For fr->ty, we will add a pair once the patch is merged and deployed in the MinT.

.. I was checking whether the above language pairs are available in Content Translation based on the info from this config file and there seem to be some inconsistencies. While most of the above languages (bcl, guw, srn, ve, bi) seem to be properly set-up >
Please, double check the above assumptions about language pairs support before making config changes.

Thanks, Pau!

I've submitted a fix for chr and to. For fr->ty, we will add a pair once the patch is merged and deployed in the MinT.

Perfect. Thanks @KartikMistry

Change 1003451 merged by jenkins-bot:

[mediawiki/services/cxserver@master] config: Fix OpusMT language codes

https://gerrit.wikimedia.org/r/1003451

Change 1003612 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2024-02-15-085232-production

https://gerrit.wikimedia.org/r/1003612

Change 1002968 merged by jenkins-bot:

[mediawiki/services/machinetranslation@master] opusmt: Add new model: sv-fi

https://gerrit.wikimedia.org/r/1002968

Change 1004349 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/cxserver@master] config: Add language pairs for Opus models

https://gerrit.wikimedia.org/r/1004349

Change 995170 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update MinT to 2024-02-20-062448-production

https://gerrit.wikimedia.org/r/995170

Change 1003612 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2024-02-15-085232-production

https://gerrit.wikimedia.org/r/1003612

Change 995170 merged by jenkins-bot:

[operations/deployment-charts@master] Update MinT to 2024-02-20-062448-production

https://gerrit.wikimedia.org/r/995170

Change #1003395 merged by jenkins-bot:

[mediawiki/services/machinetranslation@master] opusmt: Add new model: French-Tahitian (fr-ty)

https://gerrit.wikimedia.org/r/1003395

Change #1015258 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update MinT to 2024-03-28-061726-production

https://gerrit.wikimedia.org/r/1015258

Change #1004349 merged by jenkins-bot:

[mediawiki/services/cxserver@master] config: Add language pairs for Opus models

https://gerrit.wikimedia.org/r/1004349

Change #1016077 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2024-04-01-160720-production

https://gerrit.wikimedia.org/r/1016077

Change #1015258 merged by jenkins-bot:

[operations/deployment-charts@master] Update MinT to 2024-03-28-061726-production

https://gerrit.wikimedia.org/r/1015258

Mentioned in SAL (#wikimedia-operations) [2024-05-14T05:15:22Z] <kart_> Updated MinT to 2024-03-28-061726-production (T333969)

Change #1016077 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2024-04-23-221507-production

https://gerrit.wikimedia.org/r/1016077