Page MenuHomePhabricator

Evaluate the integration of the new IndicTrans model (IndicTrans2-M2M) into MinT
Closed, ResolvedPublic

Description

A new version of the IndicTrans model is available. While the original version integrated in MinT supported the translation of 22 Indic languages to and from English, the new version supports all combinations. This is particularly relevant for combinations not covered by other models such as NLLB-200 for example those involving Santali, Bodo and Dogri, which which were not supported when translating from Hindi, Malayalam or other Indic languages.

This task proposes to explore the new model and evaluate whether it can be incorporated into MinT to improve language support.

__

Initial integration in MinT of the English-to/from-Indic models were supported in T337656: Explore using IndicTrans2 - better model supporting 22 Indic languages

Event Timeline

Pginer-WMF triaged this task as Medium priority.Dec 4 2023, 4:33 PM
Pginer-WMF updated the task description. (Show Details)

Change 980356 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/machinetranslation@master] IndicTrans2 indic-indic model integration

https://gerrit.wikimedia.org/r/980356

Change 980356 merged by jenkins-bot:

[mediawiki/services/machinetranslation@master] IndicTrans2 indic-indic model integration

https://gerrit.wikimedia.org/r/980356

Change 981709 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update MinT to 2023-12-08-151348-production

https://gerrit.wikimedia.org/r/981709

Change 981709 merged by jenkins-bot:

[operations/deployment-charts@master] Update MinT to 2023-12-08-151348-production

https://gerrit.wikimedia.org/r/981709

As it can be seen below, IndicTrans2 is now used for language pairs such as Hindi-Kashmiri.
However, when inspecting the target language selector you can notice that Santali (sat) is not listed.
We may need to update MinT configuration to indicate that Santali is supported also for all combinations of the IndicTrans2 supported languages.

Screenshot 2023-12-11 at 11.47.41 2.png (657×1 px, 196 KB)

However, when inspecting the target language selector you can notice that Santali (sat) is not listed.

Santali is present.

image.png (817×952 px, 196 KB)

Nikerabbit changed the task status from Open to In Progress.Jan 8 2024, 7:54 AM