Page MenuHomePhabricator

Create a parser function to get the direction of a language or script
Open, Needs TriagePublicFeature

Description

Feature summary (what you would like to be able to do and where):

There should be a way for pages (typically templates) to easily get the direction for a language code or script code.

For example, there could be a parser function such as {{#dir:...}}:

  • {{#dir:en}} would produce "ltr".
  • {{#dir:ar}} would produce "rtl".
  • {{#dir:Arab}} would produce "rtl".
  • {{#dir:und-arab}} would produce "rtl".

If the input is a language code without a script code, it would return the direction MediaWiki has for that language.

If the input is a language code with a script code, or just a script code, it would return the direction for that script code.

MediaWiki does not currently have data about scripts, but it could get it from CLDR, which provides data about scripts generated from Unicode data (main repository, JSON repository).

They currently list 35 scripts as rtl: Adlm Arab Armi Avst Chrs Cprt Elym Hatr Hebr Hung Khar Lydi Mand Mani Mend Merc Mero Narb Nbat Nkoo Orkh Ougr Palm Phli Phlp Phnx Prti Rohg Samr Sarb Sogd Sogo Syrc Thaa Yezi

There are also a few variants of those scripts which don't get included in CLDR's data: Aran, Syre, Syrj, Syrn

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

Wiki pages often want to include words in other languages. Multilingual wikis often have translatable elements on pages and need to make sure the direction is set correctly.

There are many wikis with templates like Template:Dir (https://www.wikidata.org/wiki/Q14412446), most of which hardcode an outdated list of language codes.

Lots of wikis have modules with lists of rtl scripts (global search) and functions to return the direction (global search).

mw.language:getDir() in Lua is often not suitable because it's not easily accessible from a template without first writing a module, it only supports the languages included in MediaWiki, and it's easy to run into the limit on how many times you can use it on a single page.

Benefits (why should this be implemented?):

It would reduce the amount of maintenance needed and improve consistency across wikis (wikis would not need lists of rtl scripts, if a new rtl script is added to Unicode, it would only need to be added to one place for the data to become available to all wikis).

Supporting script codes and languages with script codes would improve support for languages not yet included in MediaWiki.

It might also be more efficient to fetch the direction from the script (when provided) by looking up scripts in a relatively short list of script codes.

Note {{Dir}} is currently the most used template in Commons. See also: T343131: Commons database is growing way too fast

Event Timeline

mw.language:getDir() in Lua is often not suitable because […] it's easy to run into the limit on how many times you can use it on a single page.

Every language counts only once, so as long as the same language is queried again and again, the Lua solution won’t run into the limit either. On the other hand, the limit is there for a reason, so a parser function solution will probably also have a limit. This is not to say that there should be no parser function, but this particular argument isn’t very strong.

(By the way, since dd74abb853ba56aef99b7c9d09dd02bdcb88129b the limit on Wikimedia is 200, so it’s not really likely that one accidentally hits the limit, unless one wants to load all languages on a page.)

mw.language:getDir() in Lua is often not suitable because […] it only supports the languages included in MediaWiki […].

What is the use case for getting directionality of languages not included in MediaWiki? Multilingual wikis’ contents are usually available only in languages included in MediaWiki.

mw.language:getDir() in Lua is often not suitable because it's not easily accessible from a template without first writing a module […].

This is true; using a module would only worsen T343131.

Multilingual wikis’ contents are usually available only in languages included in MediaWiki.

See also: T202794: Many more languages need to be added to Multilingual Wikisource (mul.ws)

Indeed, T202794 uses a different definition of “languages included in MediaWiki” than what I was thinking of:

  • Commons usually uses languages that can be selected in the preferences (have MediaWiki translations), since it displays the appropriate translation based on the language selected in the preferences.
  • Multilingual Wikisource wants to also use languages that are long extinct and thus don’t make much sense in the preferences (don’t have MediaWiki translations). However, they still need to be included in MediaWiki in one way or the other: for example, to be able to display languages using that language with the right directionality.

I looked up the source code, and Scribunto is actually extremely permissive as to what languages it accepts: for example, mw.language.new('fklflmwlmfkmf'):isRTL() happily returns false without throwing any error. So if a language is included in MediaWiki by any definition (including the definition used by mulwikisource), Scribunto will handle it and return its directionality. If it’s not included at all, a magic word won’t work either.