Page MenuHomePhabricator

Sorted wikitables do not properly handle minus signs
Closed, ResolvedPublic

Description

Sorted wikitables of type currency do not recognize minus signs. In wikibits.js, ts_currencyToSortKey could be changed from

return ts_parseFloat(s.replace(/[^0-9.,]/g,''));

to

return ts_parseFloat(s.replace(/[^-0-9.,]/g,''));

and this would half-fix the problem. But it does not fully fix the problem, because this recognizes the hyphen, -, but not the HTML minus sign, −. Columns of type numeric do not recognize minus signs, either. An example of the latter bug can be viewed at:

http://en.wikipedia.org/w/index.php?title=Wikipedia:Arbitration_Committee_Elections_December_2009&diff=prev&oldid=332579916 (Broken sort using minus signs)
ttp://en.wikipedia.org/w/index.php?title=Wikipedia:Arbitration_Committee_Elections_December_2009&diff=next&oldid=332579916 (Working sort using hyphens)

Sorting on minus signs in columns of type numeric could be fixed by going to ts_parseFloat and changing

num = parseFloat(s.replace(/,/g, ""));

to

num = parseFloat(s.replace(/,/g, "")).replace(/−/gi, "-").replace(/&(?:minus|#x0*2212|#0*8722);/gi, "-")

which would convert HTML minus signs to hyphens before attempting to parse the number; but this would not handle minus signs in currency values, because they would be removed by ts_currencyToSortKey before ts_parseFloat is called.

A more comprehensive solution to this is to substitute characters for entity references in ts_resortTable before the preprocessor is called (or maybe even before the preprocessor is chosen). To fix the bugs with minus signs it would suffice to convert minus sign references as above, but it may be desirable to convert all entity references.


Version: unspecified
Severity: minor

Details

Reference
bz21946

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:47 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz21946.
bzimport added a subscriber: Unknown Object (MLST).

conrad.irwin wrote:

patch against 60371

  • allows U+2212 (MINUS SIGN) in place of - in numbers (but not dates).
  • allows a space between the minus sign and the number.
  • allows a minus sign in a currency (before or after the initial currency marker)
  • sorts non numerics in number columns as -Infinity instead of 0 (I assume that all at one end was the intention)

Attached:

ayg wrote:

Looks good. Committed as r60376, thanks.

M8R-cyc3n3 wrote:

(In reply to comment #1)

Created an attachment (id=6901) [details]
patch against 60371

  • allows U+2212 (MINUS SIGN) in place of - in numbers (but not dates).
  • allows a space between the minus sign and the number.
  • allows a minus sign in a currency (before or after the initial currency

marker)

  • sorts non numerics in number columns as -Infinity instead of 0 (I assume that

all at one end was the intention)

This patch uses [+-\u2212] which will match anything from U+002B PLUS SIGN to
U+2212 MINUS SIGN. Need to escape the hyphen as it is no longer adjacent to the
brackets. Best practice would be to include the backslash regardless. Escaping
the plus sign to avoid confusion would not hurt either, thus [\+\-\u2212].

I tried adding U+2212 to a two-digit numeral in a table: doesn't work.

I must say I strongly support Ozob's filing of this bug; I do hope we can find a way to use the proper symbols in tables.

ayg wrote:

(In reply to comment #3)

This patch uses [+-\u2212] which will match anything from U+002B PLUS SIGN to
U+2212 MINUS SIGN. Need to escape the hyphen as it is no longer adjacent to the
brackets. Best practice would be to include the backslash regardless. Escaping
the plus sign to avoid confusion would not hurt either, thus [\+\-\u2212].

Good catch. Regex is fun. Fixed in r60430.

(In reply to comment #4)

I tried adding U+2212 to a two-digit numeral in a table: doesn't work.

The fix has been committed to trunk. It isn't live on Wikipedia yet, that will happen who knows when. Note that there's currently no reliable way of telling what revision Wikipedia is at without poking through SVN logs. It's currently at r57447, I think, and has been since early October.

What, three thousand "revisions" behind? To a tech-moron like me, it sounds strange. But I believe you. Let's hope they do a thousand in a stroke. Thanks.

happy.melon.wiki wrote:

Nope, that's standard practice, especially with the tech team being so short-staffed at this time. Scaps are usually several thousand revisions at a time.

ayg wrote:

They didn't used to be. A year or two ago we had scaps every week or so. Hopefully we'll return to those halcyon days in the imminent future, but until then we are where we are.

M8R-cyc3n3 wrote:

Note that there's currently no reliable way of telling what revision
Wikipedia is at without poking through SVN logs. It's currently at
r57447, I think, and has been since early October.

[[Special:Version]] says r59858. Is that not reliable?

happy.melon.wiki wrote:

Nope. :-D

It should be tagged as such, then. I'm all for common WPs knowing just a little of the big picture, the basics, of the techie side.