The https://github.com/AI4Bharat/IndicTrans2 project used larger corpus to train machine translation model for Indian languages. From a quick reading of code, it uses similar architecture of NLLB. From my testing using the demo site https://models.ai4bharat.org/#/nmt/v2 found that the results are better than NLLB. The grammar of sentences in translation looks better.
Since MinT supports multiple backend models. and IndicTrans2 looks like a compatible model, explore this opportunity.
The following languages are supported:
# Assamese (as)
- Bangla (bn)
- Boro (brx)
- Dogri (doi)
- English (en)
- Goan (gom)
- Gujarati (gu)
- Hindi (hi)
- Kannada (kn)
- Kashmiri (ks)
- Maithili (mai)
- Malayalam (ml)
- Manipuri (mni)
- Marathi (mr)
- Nepali (ne)
- Oriya (or)
- Panjabi (pa)
- Sanskrit (sa)
- Santali (sat)
- Sindhi (sd)
- Tamil (ta)
- Telugu (te)
- Urdu (ur)