Page MenuHomePhabricator

Make Extension:ParserFunctions convert localized digits to arabic numerals in #(if)expr and #time
Open, Needs TriagePublic

Description

Problem

ParserFunctions Extension does not support any UTF-8 Digit, Which is Define in $digitTransformTable (languages\messages\MessagesHi.php). like , .

Step of Reproduction

  • Goto Any wiki where digitTransformTable is enabled. In my case, I went https://hi.wikibooks.org
  • Make your own User Sandbox. and try this.
{{#expr: १ + २ }}
{{#time: Y m d H:i:s | १९५९ }}
{{#ifexpr: १२३४५ = १२३४५ | A | B}}
{{#ifexpr: १ > ० | yes }}

Solution

Fetch Numeral from '$digitTransformTable' (If Defined and Enabled) to local wiki. And Make Compatible with UTF-8 Digit Support.

Additional Note

  • @MarkAHershberger Sir, Already made some ParserFunctions.diff Take help from (T36193). And make this in Final line.
  • {{#expr: 1 + 2 }} return 3 while {{#time: Y m d H:i:s | 1959 }} return local numeral on hiwikibooks See My Sandbox.

Event Timeline

Restricted Application added a project: User-Jayprakash12345. · View Herald TranscriptDec 21 2017, 1:36 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Adding You All. Because you were involved in T36193.

MarkAHershberger added a comment.EditedDec 23 2017, 9:52 PM

I am not the target audience for this fix but I'm willing to help shepherd a fix.

I need, though, someone who:

  1. is affected by this,
  2. understands the technical issues involved,
  3. has the capability to understand or already understands MediaWiki core, and
  4. is willing to test and provide a technical solution.

We will also need on-wiki consensus from the affected wikis that this fix is desired.

Once we have all of the above, we will need to convince developers in MediaWiki core that this is the right thing to do, but without the above, any attempt to move this forward is pretty much moot.

If anyone can fulfill those first four criteria and is willing to work on getting consensus from at least one affected wiki, please have them contact me.

Bawolff added a subscriber: Bawolff.EditedDec 23 2017, 11:47 PM

#time and #(if)expr are different enough i think thet should be discussed separately. (struck because im unsure)

I am undecided if this is a good idea or not.

Pros

  • makes it easier to use for non-english speakers

Cons

  • makes the behaviour of the func vary significantly depending on (page? Or should it be content?) language.
  • there is an easy work around with {{#formatnum:|r}}

The change is trivial. The issue is political - is this the right thing to do? Although this is a small change, perhaps an rfc is in order to clarify developer opinion on this matter.

Bawolff renamed this task from Make Extension:ParserFunctions Compatible with UTF-8 Digit to Make Extension:ParserFunctions convert localized digits to "english" digits in #(if)expr and #time.Dec 23 2017, 11:49 PM
Bawolff removed a subscriber: Bawolff-alt.

Thanks, @Bawolff. Good point.

I'll help write an RFC if anyone is interested in getting this done.

There is also the matter of if localized digit grouping would also be converted. When we localize numbers we usually change 1000000 -> 1,000,000. (Or localized equivalent). I dont think we should support getting rid of thousands seperators in #expr as that would probably break stuff

MarkAHershberger renamed this task from Make Extension:ParserFunctions convert localized digits to "english" digits in #(if)expr and #time to Make Extension:ParserFunctions convert localized digits to arabic numerals in #(if)expr and #time.Dec 23 2017, 11:55 PM

Thanks for creating this task and working to solv this problem. wikimedia foundation respect all languages and we create encyclopaedia in all language. Also we respect all writing scripts. Hindi is a 4th spoken language in world. Sanskrit is a first oldest language of world. Both and lots of language write in devnagri script, also Gujarati etc language have own script and numric system. But now only Arebic numric system supported in #expr. Old and first numric system going to be die.
We all contributed on wiki because we love our language. We are helpless to use our own numric system in our language wiki.
Lots of language strongly need this solution.
Thanks.

Jayprakash12345 added a comment.EditedDec 24 2017, 6:41 AM

We will also need on-wiki consensus from the affected wikis that this fix is desired.

This is Paradox. Because Both things depend on Each other. Last time When We discussed Numeral System. The user who is in oppose. They give us logically statement that "ParserFunctions does not support local numeral. Hence I am opposing"

In Jaipur Technical Workshop, Senior Hindi WIki Editors asked me that Is it possible to apply Local Numeral in Hindi Wiki. And I responded positively. Because I knew that Internationalization is fundamental in Wikimedia. 80% user agree to apply local Numerical System. Now the Hindi WIki conference is announced for on 12-14 Jan. And For applying Local Numeral can be a consensus.

My main objection to creating this task, is made ground for it. I think It can take 2-3 month to resolve. Please start working on it. If Community Declined. then you Close it.

We should not forget the Wikimedia's Internationlization. Apart from Wikimedia project, It can be beneficial for enhance MediaWiki.

Thanks, @Bawolff. Good point.

I'll help write an RFC if anyone is interested in getting this done.

Sir Is it for adding TechCom-RFC?

So, for the on-wiki consensus part, you could start by Hindi wikis, as you discussed with Hindi wikis editors.

Open a discussion summarizing the issues and the discussions you had at the Jaipur Technical Workshop, and get feedback about your proposal,

To override the paradox statement, you can offer as question something like "If a technically sound approach allows MediaWiki to support local numeral in parser functions #ifexpr, #expr and #time, would you agree we allow such use on this wiki?".

I dont really think on wiki consensus is needed at this time - this sort of thing has been on and off requested for a long time. The contentious issue is developer consensus - should #expr be treated like a programming api where template authors are expected to normalize their input, or is it more like a formatting construct that should be easy to use as possible for users.

@Dereckson Sir, ok

Can someone help me to make RfC?

Anamdas added a subscriber: Anamdas.EditedDec 25 2017, 5:22 PM

So, for the on-wiki consensus part, you could start by Hindi wikis, as you discussed with Hindi wikis editors.

Open a discussion summarizing the issues and the discussions you had at the Jaipur Technical Workshop, and get feedback about your proposal,

To override the paradox statement, you can offer as question something like "If a technically sound approach allows MediaWiki to support local numeral in parser functions #ifexpr, #expr and #time, would you agree we allow such use on this wiki?".

I dont really think on wiki consensus is needed at this time - this sort of thing has been on and off requested for a long time. The contentious issue is developer consensus - should #expr be treated like a programming api where template authors are expected to normalize their input, or is it more like a formatting construct that should be easy to use as possible for users.

I am not a technically proficient user, so writing this comment as a layman. The very fact that wiki projects are available in all languages, provides sufficient logical base for use of numerals written in different styles in different languages. We could have had only one wikipedia, only one style of numerals. Whichever logic has been used for supporting multi-language wikipedias, holds good for using numerals in local formats. I am a Hindi Wikipedian and in last 5 years I have seen this discussion many times within the community, everytime it has been a lengthy discussion but inconclusive. Even I found myself changing the sides, from supporting international format earlier, to local format now - after knowing that its possible to carry out calculations, use infoboxes, templates etc. using local numeral formats. As far as consensus is concerned, we are never going to have that until and unless we get to experience the both options and it is the duty of the developers to make available both the options and then leave it upon time and the community to decide. Until we have seen something, how can we decide about it, how can we judge it?

In one of the comment above, it has been said that its a political issue. I beg to differ and would like to term it as a sentimental issue. Wikipedia is a place where only sentimental people come. People with brains go after money, People with hearts come to Wikipedia. Having complete functionality with local numerals is a sentimental issue as this will lead to consummation of the love for the language, which brings us here. See it logically or sentimentally- we need to have complete functionality for local numerals first, whether to keep it or discontinue it, can always be decided and done in future at any point of time. Therefore, I earnestly request you to please provide this facilty. Regards. --~~~~

Amire80 moved this task from Untriaged to Digit localization on the I18n board.Feb 4 2018, 10:44 AM
Restricted Application added a subscriber: alanajjar. · View Herald TranscriptFeb 4 2018, 10:44 AM
Verdy_p added a subscriber: Verdy_p.EditedMar 3 2018, 10:35 AM

I think it would be ok to support all digits in any known scripts that have decimal digits (i.e. the "Nd" generic property in Unicode).

However there remains the question of how to handle other characters, notably the decimal and group separators (not necessarily the full stop "." or space). Some group separators should be transparently handled (notably spaces) but some templates may depend on the usage of spaces as separators between different numbers in a list of numbers, and some templates are also depending on the fact that such parsing returns an error when there are spaces or alternate separators, to render some parameter differently as if they were not single numbers.

So in summary it is OK for decimal digits only, but not for alternate separators, or even alternate signs, or alternate exponential notations (not using "e" or "E"), or "NaN" and "Infinite" notations.

Also not OK for fractional notations (may break when a templates expects the "/" to be field separators in dates for example).

May be we could parse successfully some fractions encoded as single characters such as "5¼" evaluated as if it was "5.25". These fractions are also given numeric properties in the Unicode character database and they exist also in non Latin scripts where they are more commonly used. But not OK for parsing successfully Roman number like "III" as if it was "3", where it could be confused with a text (non-numeric) word.

So if we want to enable the parsing of fractions or Roman numbers, this should require an explicit additional parameter to enable it.

Warnings:

  • fractions like "1 1/2" will cause problems: does it mean "1.5" or "5.5" if we ignore the whitespace which in the first case means an addition "+" operator, and in the later case means a group separator?
  • group separators do not necessarily occur at every multiple of 3 digits (this is false at least in Indic languages, even in English as it is used in Southern Asia, and aso false for number codes that can have free placement of separators, such as phone numbers, social security numbers and other "numeric" identifiers like IPv4 and IPv6 addresses!).

So it is not so easy to relax the lenient parsing of group separators, decimal separators and even signs (when they are also used as field separators between two numeric fields).


So in summary: OK for being lenient by default only on decimal digits (Nd). Any other characters should require a specific parser and should not be activated by default in #ifexpr:, #expr:.

But there may be some improvement in #time: to recognize a few more valid date/time formats as long as they are not ambiguous: the case of "01/02/03" is symptomatic of all the ambiguities it causes, and which can only be solved by having #time: taking a locale parameter, i.e. a language code, which is not easy to integrate in #ifexpr:, but may be integrated in #expr:: this locale parameter should have the default value en in all cases where it is missing, to preserve the compatibility, and not the default language of the wiki or any other default.

My opinion is that such locale parameter should be handled by other localisable parser functions such as {{#exprl:langcode|expression}} and {{#ifexprl:langcode|expression|value if true|value if false}}, where the default langcode, if it's missing, may be the default language of the wiki (or possibly the content langage of the page if it can be determined to override this default).

This locale parameter could also be added to {{formatnuml:langcode|number}} where the number will be the result of {{#exprl:langcode|expression}} and will then be parsed by formatnuml: in its specified locale (meaning that {{formatnuml:langcode|123.456}} may interpret the "." as a group separator and not necessarily as a decimal separator like what is returned in English only by {{#expr:expression}}.


Note supporting the "Nd" digits in Unicode does not require any localization, it can be built in parsers for "#expr" and "#ifexpr:" directly from the UCD properties (implemented already in ICU since long, but even ICU is not needed as all decimal digits are in a small set of contiguous ranges of 10 digits).

I don't see any good reason for not being lenient on input on these digits, even if "#expr:" will only output ASCII digits (a localized "{{#exprl:locale|expression}}" can still format the computed value using the specified locale or "{{#exprl:locale|expression}}" can use a default locale, the page's content language or the site default locale otherwise, so that they will returned localized digits, including possibly non-decimal numeric systems with a suitable locale, e.g. Roman digits in the "la" locale or a variant).

How will this task go ahead? T190643, T189405, T155888 and So many tasks in different Indic Project.

Every Indic Project has its own Numeral system. Every Indic Project wants this. We should respect multilingualism.

Capankajsmilyo added a comment.EditedApr 1 2018, 7:34 AM

This might be relevant https://en.m.wikipedia.org/wiki/Module_talk:WikidataIB#Trying_to_use_on_sawiki

For example
If we run

{{#expr: trunc ({{#time: Y.md}}-{{#time: Y.md|1992-06-13}})}}

on https://sa.wikipedia.org/wiki/%E0%A4%A6%E0%A4%BF%E0%A4%B6%E0%A4%BE_%E0%A4%AA%E0%A4%9F%E0%A4%BE%E0%A4%A8%E0%A5%80

It returns "वाचनिकदोषः : अनपेक्षितम् उद्गारचिह्नम २" which translates to "Expression error: Unrecognized punctuation character २".

Same code if run on https://en.wikipedia.org/wiki/Disha_Patani returns 25

Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptSat, May 18, 2:12 PM