Page MenuHomePhabricator

Add i18n dir=rtl support
Closed, ResolvedPublic

Description

PR

MR!88.
1ad1889a

Edit

LLM-assisted content removed :

The diff is therefore dir=rtl support only.

LLM-assisted translation outreach rolled back:

Event Timeline

Yug updated the task description. (Show Details)
Tacsipacsi subscribed.
  • ✅ i18n expansion +40 locales via LLM translation

Please don’t push machine translation to translatewiki.net. translatewiki.net builds on human translation, and machine translation can cause friction and pollute the translation memory (which suggests existing same-language translations from any project when creating new translations). Please revert dee6e12d65d20bbdaf527c74f4a3783607a32b60 ASAP and have the already-imported pages removed from translatewiki.net to stop polluting the translation memory (you can ask them to be deleted at https://translatewiki.net/wiki/Support). I’m reopening the task because of this.

  • ✅ add dir=rtl support

This may be good for a start, but it’s not scalable: if someone starts to translate into Egyptian Arabic, Yiddish or N'Ko, the translations will appear left-to-right because these languages aren’t present on the list of known RTL languages – even though all three are right-to-left languages (Egyptian Arabic uses Arabic script, Yiddish uses Hebrew script, and N'Ko has its own script). This is not as urgent, and has no straightforward solution, so it doesn’t necessarily have to happen in this task; you may want to create a follow-up task for it.

Hello @Tacsipacsi ,

  • add dir=rtl support : Since the list of rtl languages was hard coded we & I had control over its expansion for other lesser know languages. I changed it so it now relies on /i18n/languages-i18n.json, which is also hand & hard coded by myself each time we approve a language translation file c60b82b7. As of now, for tests, all translations are activated. In production, only fully human-reviewed translations on Translatewiki will be activated.
  • 🔸 i18n expansion +40 locales via LLM translation :

for this, I would like to work with LLM (=computer) assisted-translations, which is a standard practice in translations today. Translate wiki does exactly the same with MinT and Google Translate suggestions. While those MinT & Google Translate queries translate isolated sentences, my LLM usage : 1. recycles past human-reviewed translations from our previous version lingua-libre/RecordWizard/i18n, translated the whole document as a group, therefore ensuring stronger alignment between sentences. The restult I got for English-French were pretty good : I observed only about 96% (~170) correct translations and 4% (~7) erroneous sentences which I promptly reviewed.

LLM is a tool, we need to be aware of its shortcomings -as I am- and to use it accordingly. Accordingly, I also upgraded my validation parameter : translations files will face our end-users only after full human review has been done on translate wiki. There is the message I tested earlier today, before your message on Phabricator :

Hello @[[User:Amire80|Amire80]],
I hope this message finds you well. Letting you know that after 18 months of occasional open source work on Lingua Libre Django[1] (secret unstable beta), our new version is getting close to production-ready and is now looking for translations files' reviewers. We recycled the previous Lingua Libre's i18n files and LLM to create the new translation files.[2] While helpful, only human reviewed translations will be accepted in production usage. Could you help Lingua Libre by reviewing Hebrew (<code>he</code>) translations[2] ? It would give better access to this community. ~~~~

Two years ago when I went with such pro-active messaging approach we were able to translate about 30 languages within 3 weeks. With this approach, I think we can do faster and better. Why not try it and learn from it ?

Hello @Tacsipacsi ,

  • add dir=rtl support : Since the list of rtl languages was hard coded we & I had control over its expansion for other lesser know languages. I changed it so it now relies on /i18n/languages-i18n.json, which is also hand & hard coded by myself each time we approve a language translation file c60b82b7.

As long as it’s manually maintained, it’s going to have the risk of unknown languages appearing. The current version of languages-i18n.json still contains only the three RTL languages Arabic, Persian and Hebrew, so the only improvement brought by your patch is that the same data is now maintained manually at a single place rather than two places. In the long term, you should get this piece of information automatically from somewhere – for example from CLDR, MediaWiki or Wikidata – so that new translations appear correctly without any manual work, even if you happen to leave WMFR and the tool remains without maintainer.

  • 🔸 i18n expansion +40 locales via LLM translation :

for this, I would like to work with LLM (=computer) assisted-translations, which is a standard practice in translations today. Translate wiki does exactly the same with MinT and Google Translate suggestions. While those MinT & Google Translate queries translate isolated sentences, my LLM usage : 1. recycles past human-reviewed translations from our previous version lingua-libre/RecordWizard/i18n, translated the whole document as a group, therefore ensuring stronger alignment between sentences.

The problem, and the difference to what translatewiki.net does, is that your machine translations are not flagged as such: not when people look at the list of translations of Lingua Libre Django, and even less when the translations are fed into the translation memory. I’d be open to using machine translations (whether Google, MinT, or LLM) if they would be flagged as such. Translate does not have support for such flagging right now, but maybe it could be developed. However, until such development happens, I ask you to refrain from feeding machine translations into translatewiki.net and to remove the ones you’ve already submitted.

The restult I got for English-French were pretty good : I observed only about 96% (~170) correct translations and 4% (~7) erroneous sentences which I promptly reviewed.

Please note that English and French are both European languages, with a lot of common history and a lot of human translations made throughout the centuries, so machine translation is pretty likely to perform better on this language pair than when translating from English to a non-European language (or even a non-Indo-European language of Europe, like Hungarian or Basque).

LLM is a tool, we need to be aware of its shortcomings -as I am- and to use it accordingly. Accordingly, I also upgraded my validation parameter : translations files will face our end-users only after full human review has been done on translate wiki. There is the message I tested earlier today, before your message on Phabricator :

Hello @[[User:Amire80|Amire80]],
I hope this message finds you well. Letting you know that after 18 months of occasional open source work on Lingua Libre Django[1] (secret unstable beta), our new version is getting close to production-ready and is now looking for translations files' reviewers. We recycled the previous Lingua Libre's i18n files and LLM to create the new translation files.[2] While helpful, only human reviewed translations will be accepted in production usage. Could you help Lingua Libre by reviewing Hebrew (<code>he</code>) translations[2] ? It would give better access to this community. ~~~~

How do you determine if human review has been done? And how do you differentiate between unreviewed LLM translations and human translations that haven’t been reviewed by other humans? (The latter are roughly equivalent to reviewed LLM translations: one human has reviewed them.) If you require independent human review even for human translations, languages with less translators are likely not to be published ever, because there won’t be a second person spending time on translation reviews. (Hungarian is a relatively large language, yet translations are rarely formally proofread on translatewiki.net, because we have better things to do.)

Review status can be determined on translatewiki.net, when the review column reach 100%.

LLM-assisted content removed :

Yug renamed this task from Add i18n dir=rtl support and +40 locales to Add i18n dir=rtl support.Oct 11 2025, 6:25 PM
Yug updated the task description. (Show Details)
Yug updated the task description. (Show Details)
Yug claimed this task.
Yug updated the task description. (Show Details)
Yug updated the task description. (Show Details)
Yug updated the task description. (Show Details)

LLM-assisted content removed :

Thanks! However, they also need to be removed on translatewiki.net, or else they will make their way back into the repo. I made a request.

Review status can be determined on translatewiki.net, when the review column reach 100%.

[…]

…which is again something manual. I don’t understand you: you want to have automated translations so that human translators have less work, but at the same time you make a lot of manual work for yourself. Given that from now on, we are going to have only translations that are ready to go live, they should go live at the moment they are committed to the repo. Maybe I’ll try to put together some automation when I have time.

@Tacsipacsi , thank for the deletion request on Phabricator !

i18n current update approach

For i18n, this is our current approach :

  • coded once : the components/LanguageSelector.vue
  • constant expansion via crowd translation : translatewiki.net translation, i18n/{iso}.json
  • regular human action : update i18n/languages-i18n.js, redeployment of front-end on server so new locales go live.

With T407063: "i18n call for translation on translatewiki.net", we will get the top 12~20 languages and a reach of about 6B people within ~2 weeks. 20~30 more translations will come in the next 3 months. We can indeed redeploy the front regularly during this period. After what we can redeploy on demand or twice a year. This is a proven method.

i18n update automation ?

Code a built-in method to automatically add recently translated locales has pitfalls for low ROI.
Pitfall :

  • need to exclude files with partial translations
  • P1406 script directionality is not on language wikidata items (ex: Q9168), but on scripts (ex: Q1828555 Arabic script). Needs new sparql.
  • coding time, which I don't have.

Context

For context, I'm a language teacher by training, the project's initiator and top facilitator for past decade, and I learnt coding on the way. I have a one month freelance to boost this project toward production. I'm not WMFR, nor the main developer. Budget for the developer has dried up so I'm taking over because I'm stubborn like a winter Canadian bison. 😆🚀 But I must refocus asap on my freelance which is project management, reporting, communication. Then do volunteer fundraising for the next coding cycle, so the young devs on this project get descent income and the project continue its growth. 👍🏼 The technical choices I do are constrained by this frugal and mixed volunteer/freelance context.