Page MenuHomePhabricator

Adjust Norwegian mapping for MinT configuration
Closed, ResolvedPublic

Description

Language codes are used to identify languages for Wikipedia and for the translation models used in MinT. Since there are different standards for language codes (using 2 or 3 letters) and exceptional cases, some adjustments in the mappings were needed (T336525). This ticket covers the particular case of Norwegian.
The Norwegian Bokmål Wikipedia lives on no.wikipedia.org for historical reasons, but its content language is nb.

To reflect this in MinT, the following changes are needed:

  • Mapping from Wikipedia codes to NLLB-200 codes. In this file the "nb": "nob_Latn" line should be replaced by "no": "nob_Latn"
  • List of supported languages. In this file the - nb item should be replaced by - no.

Event Timeline

Change 925759 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[mediawiki/services/machinetranslation@master] Adjust Norwegian mapping for MinT configuration

https://gerrit.wikimedia.org/r/925759

Have you tested that this change works as intended? The domain name for the Norwegian Bokmål Wikipedia is no.wikipedia.org and the database name is nowiki, while the content language is set to nb. If the tool uses the domain name, this change would work, but if it uses the content language I fear it might break.

Change 925759 merged by jenkins-bot:

[mediawiki/services/machinetranslation@master] Adjust Norwegian mapping for MinT configuration

https://gerrit.wikimedia.org/r/925759

Change 927160 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update MinT to 2023-06-06-120533-production

https://gerrit.wikimedia.org/r/927160

Have you tested that this change works as intended? The domain name for the Norwegian Bokmål Wikipedia is no.wikipedia.org and the database name is nowiki, while the content language is set to nb. If the tool uses the domain name, this change would work, but if it uses the content language I fear it might break.

I included Norwegian for one of the upcoming iterations of enablements (T338146) with special attention to verify things work as expected.

Excellent! Then I trust all is good.

Change 927160 merged by jenkins-bot:

[operations/deployment-charts@master] Update MinT to 2023-06-06-120533-production

https://gerrit.wikimedia.org/r/927160

Mentioned in SAL (#wikimedia-operations) [2023-06-07T05:02:51Z] <kart_> Updated MinT to 2023-06-06-120533-production (T337910, T337686, T337708)