Page MenuHomePhabricator

Use sort collation config in JavaScript (jquery.tablesorter)
Open, NormalPublic

Description

Split off from T2164, this is a tracking bug for areas that need improved client-side sorting.

Details

Reference
bz30674

Event Timeline

bzimport raised the priority of this task from to Normal.Nov 21 2014, 11:51 PM
bzimport set Reference to bz30674.
bzimport added a subscriber: Unknown Object (MLST).
  • Bug 8732 has been marked as a duplicate of this bug. ***
TheDJ added a comment.Jul 6 2013, 3:42 PM

Collation can already be adapted using: mw.config.set('tableSorterCollation',{'Ä':'A','Ö':'O','Ü':'U','ä':'a','ö':'o','ü':'u','ß':'ss'});

The question is, how do we do this automatically. Create a JS version of the Collation class ?

Easiest approach would probably to in the parser: read through the table, generate binary sortkeys, turn them into some non-binary form, put it in a data attribute.

The collation class (or rather the third party icu library used by it) is rather complex. I'm doubtful we could re-create it in javascript sanely. For example, it needs to do sorting on three different levels, be able to dynamically insert new "in-between" values, etc. We also don't even know what rules are being used at runtime (as the php bindings don't expose that, and it changes with version).

OTOH, I suppose it doesn't need to be exactly the same. Fixing just the really bad mismatches in sorting behaviour might be good enough.

mxn added a subscriber: mxn.Nov 24 2014, 8:54 PM
Kaganer added a comment.EditedJun 23 2017, 3:15 PM

This is actual problem for for various languages, for various scripting systems. In my opinion, needs to using CLDR instead custom tables and other client-side hacks.

In the https://phabricator.wikimedia.org/source/mediawiki/browse/master/resources/src/jquery/jquery.tablesorter.js currently used functions based on calling UTF charset directly:

function sortText( a, b ) {
  return ( ( a < b ) ? -1 : ( ( a > b ) ? 1 : 0 ) );
}

function sortTextDesc( a, b ) {
  return ( ( b < a ) ? -1 : ( ( b > a ) ? 1 : 0 ) );
}

In my opinion, these (or things around this) shoulds be replaced to calling CLDR data for built sorting index.

This is successful resolved for categories - see T162823 - and these same sequence should be used in the tablesorting.

PS: Maybe someone methods are exists in the https://github.com/rxaviers/cldrjs or https://github.com/globalizejs/globalize

The data required for collation is comparatively huge, somewhere on the order of megabytes I think. Even if there was a ready-to-use JavaScript library that implements the Unicode Collation Algorithm (which to my knowledge there isn't, but I'd love to be proved wrong), we couldn't reasonably ship it to the browser.

The alternative solution would be to precompute the sortkeys in PHP code, like @Bawolff already suggested above (and then we can compare the sortkeys using the naive method and get correct results). This would approximately double the amount of data sent to the user, which is not great but probably better than shipping the collation data in most cases. But the PHP parser doesn't currently know whether it's dealing with a sortable table or a regular boring one (and I'm not sure if it even knows the contents of a table cell while rendering its attributes, which would be required here). Overcoming these problems is surely possible, but it would be a non-trivial undertaking.

I think the current mw.config.set( 'tableSorterCollation', ... ) workaround is sufficient for most cases. For example, we've been using it on Polish Wikipedia for years (search in https://pl.wikipedia.org/wiki/MediaWiki:Common.js), and it's also feasible for more complicated cases like Serbian (https://sr.wikipedia.org/wiki/Медијавики:Common.js).

Kaganer added a comment.EditedJun 26 2017, 11:14 AM

Heh, this may be good idea for monolingual projects, but not for multilingual sites, as Meta-Wiki, Wikimedia Commons or Wikidata (and also for small chapter's wikis, as [[wmru:]]).
Should be another solution.

And besides this, functions 'sortText' and 'sortTextDesc' are written incorrectly and should be changed in any case. Comparison based on the UTF code is incorrect for this purpose.

TheDJ added a comment.EditedJun 26 2017, 2:52 PM

We can overload with https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/localeCompare on platforms where available.

It should be noted however that this function can be significantly slower.

Amire80 moved this task from Untriaged to Collation on the I18n board.Mar 18 2018, 1:44 PM
Dvorapa added a subscriber: Dvorapa.Jan 7 2019, 2:38 PM

Not sure what needs to be done here, but... We use and have set Unicode collation for Categories (uca-cs for Czech Wikipedia). This is already configured for majority of Wikipedias. Why don't we use the same here?

TheDJ added a comment.Apr 12 2019, 9:25 AM

@Dvorapa that makes use of MySQL logic to do the actually implementation of that a setting. We cannot make use of that in Javascript (well localeCompare in theory does, but most browsers haven't really implemented it and just fallback to a standard collection for most languages)

Change 517266 had a related patch set uploaded (by TheDJ; owner: TheDJ):
[mediawiki/core@master] Tablesorter: Use localeCompare

https://gerrit.wikimedia.org/r/517266

Change 517266 merged by jenkins-bot:
[mediawiki/core@master] Tablesorter: Use localeCompare

https://gerrit.wikimedia.org/r/517266