Page MenuHomePhabricator

New tool to delimit scientific notation
Open, LowPublic

Description

Author: Greg_L_at_Wikipedia

Description:
This is a continuation of Bugzilla 13025 and should be considered its replacement.

A template [[Template:Val]] has to use math-based methods to parse the string in order to count digits and place gaps between the digits. This technique is prone to rounding errors. Even though the template has an error-checking ability, it still generates improper strings. For instance, {{val|6.02214184||e=23}} will generate 6.022 141 839 × 10^23 (note the “39” instead of the “4”).

Note how this tool is used (with work-arounds) here on [[Kilogram]]

http://en.wikipedia.org/wiki/Kilogram#Proposed_future_definitions

What is desperately needed are new parser functions that will permit the simple counting of characters to delimit numeric strings rather than math-based parser functions. This will allow the creation of a new magic word by the name of {{delimitnum}}. Note that there is already a template by the same “delimitnum” name. However, it is even more prone to rounding errors that {{val}} because it has no error checking. {{Delimitnum}} should be replaced by a parse function as proposed herein.

Delimitnum’s functionality is largely described here:

http://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style_(dates_and_numbers)/Archive_94#Grouping_of_digits_after_the_decimal_point_.28next_attempt.29

It was extensively discussed and voted upon here…

http://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style_(dates_and_numbers)/Archive_94#Continuing_Discussion.2C_specifically_regarding_latest_nutshell_proposal

on WT:MOSNUM and was well received here on WT:MOS:

http://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style/Archive_97#Exponential_notation

…where its functionality tweaked.

Here is a nutshell of how the new, replacement delimitnum magic word should work:

The magic word would parse as follows:

{{delimitnum: (value) | (uncertainty) | (base–ten exponent) | (unit symbol) }}

It would use span-based tags (e.g. <span style="margin-left:0.25em">) to space out characters without actually generating a separate character (so values copied and pasted into programs like Excel, where they would be treated as true numbers). The template would replace hand-coded strings such as this:

6.022<span style="margin-left:0.25em">461<span style="margin-left:0.2em">79</span></span>(30)&thinsp;×&thinsp;10<sup>23</sup>&nbsp;kg

The parsing and spacing logic would be as follows:

Q1: Are there five or more undelimited digits remaining after the decimal marker? No=Stop / Yes=Advance three digits and prepare to add span gap. Goto Q2.
Q2: Is the span gap to be added following the digit “1”? No=Add a span gap of 0.25 em and then goto Q1 / Yes=Add a span gap of 0.2 em and then goto Q1.

The exact em widths chosen above produce the best looking results on the widest range of computing platforms. Some browsers resolve to 0.05 em. Others don’t and round up whereas others don’t round up. These characteristics are exploited to our advantage. Note also that a span gap following the digit 1 is different (0.2 em) v.s. that used for the others.

As mentioned above, details can be found here:

http://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style_(dates_and_numbers)/Archive_94#Grouping_of_digits_after_the_decimal_point_.28next_attempt.29

Note however, that since the above thread was archived, the spaces on each side of the “x” (multiply) sign have changed to thinspaces (&thinsp;) per the above cited discussions on WT:MOS


Version: unspecified
Severity: enhancement

Details

Reference
bz15677

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:19 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz15677.
bzimport added a subscriber: Unknown Object (MLST).

Greg_L_at_Wikipedia wrote:

Example code that would be generated

This is an txt file for placement into any Wikipedia test page showing example output code that would be generated.

attachment Example code.txt ignored as obsolete

Greg_L_at_Wikipedia wrote:

RTF file showing example code

This attached rtf file shows the Wiki-code that would be generated using various input options. Paste into any Wikipedia page.

Attached:

Greg_L_at_Wikipedia wrote:

Please also, it would be very nice if delimitnum would also do what {{val}} currently does with negative exponents: if an editor inputs a hand-typed hyphen/minus (character 45) from the keyboard (-), the rendered version should substitute Wikipedia’s minus sign (−) from the “insert” menu.

Greg_L_at_Wikipedia wrote:

Per conversation with Werdna…

http://en.wikipedia.org/w/index.php?title=User_talk:Werdna&oldid=250362962

a separate parser function to count characters can not be expected. Accordingly, the only practical solution is an entire magic word. The most straightforward solution would be to rewrite {{val}} so it retains its current functionality but no longer relies upon math to parse the numbers. The best solution would be to also do the same for {{delimitnum}}.

I think this is a lot more complicated than it sounds on first read. It would need to take into consideration different styles of number formatting by language - http://en.wikipedia.org/wiki/Decimal_separator#Arabic_numeral_system - The way I see it, the main options are:

Option 1:

  • Create several different formatting options and maintain a list of which style goes with which languages, possibly with a switch option in the function to use a non-default style
    • Advantages: Easier on users, easier to code than option 2
    • Disadvantages: Requires more maintenance and more initial work in determining which style goes with which languages

Option 2:

  • Create a fancy syntax for the function to tell the parser function how to format it, similar to the {{#time:}} function syntax, though probably much more complicated
    • Advantages: Doesn't require localization, more flexible
    • Disadvantages: Harder for users to use, may have issues similar to StringFunctions (bug:6455), harder to code than option 1

Option 3:

  • Create a separate extension to add the function (as opposed to putting it in the ParserFunctions extension or core) that would only format numbers in the English style (at least initially)
    • Advantages: Could eventually become like option 1 without needing to be fully localized immediately, easier to code
    • Disadvantages: Somewhat unfair to other languages until support for them is added, slightly more overhead code

IMO, option 3 would be the best. Option 2 would likely be far too complicated to be useful outside of complex templates. And, since option 3 requires the least work, would be most likely to actually get done.

Wikini added a subscriber: Wikini.Jun 25 2015, 9:35 AM
Meno25 removed a subscriber: Meno25.Feb 22 2016, 6:07 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 22 2016, 6:07 PM