Page MenuHomePhabricator

Test numeric sorting on Swedish Wikipedia
Closed, ResolvedPublic1 Story Points

Description

Before we deploy numeric sorting on English Wikipedia, we should test it on a real wiki with actual categories and defaultsort keys. Johan has offered to start a proposal on Swedish Wikipedia to turn on numeric sorting so that it can be more thoroughly tested.

The actual change will be switching svwiki's collation from uca-sv to uca-sv-u-kn.

Event Timeline

kaldari created this task.Aug 4 2016, 5:46 PM
Restricted Application added a project: User-Johan. · View Herald TranscriptAug 4 2016, 5:46 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
kaldari moved this task from Untriaged to CL, QA, Data analysis backlog on the Community-Tech board.
kaldari triaged this task as Normal priority.
Johan added a comment.Aug 4 2016, 11:06 PM

Will start a discussion on Monday.

Johan added a comment.Aug 8 2016, 9:25 PM

Will the number of bot-created articles be a problem? They're categorized as well, after all.

Johan added a comment.Aug 9 2016, 10:35 AM

It's apparently not, so the question has been asked.

Discussion on Swedish Wikipedia.

Johan added a comment.EditedAug 9 2016, 11:22 AM

Question that came up: uca-sv-u-kn takes Swedish digit grouping and decimal comma into account, right? Have we specifically tested it?

(Swedish digit grouping differs from English. 100.000,00 would usually be written 100 000,00 in Swedish, occasionally as 100,000.00, as opposed to 100.000,00 in English.)

Johan added a comment.EditedAug 9 2016, 11:31 AM

Do we have a nifty solution for articles that have solved this problem using DEFAULTSORT already, e.g {{STANDARDSORTERING:0011 Freunde}}? Maybe running a script to find articles where defaultsort starts with 0/00/000 but otherwise matches the article name? Or will it even affect things at all? The numerical value is still the same.

Johan added a comment.Aug 9 2016, 11:37 AM

Or adds them somewhere, not necessarily in the beginning of the article name.

kaldari moved this task from Ready to In Development on the Community-Tech-Sprint board.

@Johan: Any time a number is separated with a non-numeric character, UCA collation will treat the separator as a string and the separated pieces as separate numbers. This is true in English, Swedish, and all UCA collations. Separators are considered ambiguous characters and thus not treated as parts of numbers. This can be worked around (in the case of large integers) by removing the separators in DEFAULTSORT keys. So for example, if you had an article entitled "9,999 bottles of beer", you could add a DEFAULTSORT key of "9999 bottles of beer".

@Johan: I'm not aware of any workaround for decimals though.

@Johan: Let's try to wrap up that discussion next week if possible.

Change 304262 had a related patch set uploaded (by Kaldari):
Updating $tailoringFirstLetters for Swedish

https://gerrit.wikimedia.org/r/304262

Change 304262 abandoned by Kaldari:
Updating $tailoringFirstLetters for Swedish Per https://ssl.icu-project.org/trac/browser/icu/trunk/source/data/coll/sv.txt

Reason:
Already in the list :P

https://gerrit.wikimedia.org/r/304262

Posted in the discussion that I'm reading consensus as "the change is fine as long as it doesn't sabotage anything that's working right now". Giving folks a chance to protest, but I don't see any reason anyone would if they haven't so far.

No new posts for a couple of days.

No further protests. I'd say we can carefully go ahead.

Change 306216 had a related patch set uploaded (by Kaldari):
Switching Swedish Wikipedia to uca-sv-u-kn collation

https://gerrit.wikimedia.org/r/306216

@Johan: Deployment is scheduled for 11-noon Pacific time today.

Johan added a comment.Aug 23 2016, 2:50 PM

OK, I'll mention it on the Village Pump.

DannyH reassigned this task from Johan to kaldari.Aug 23 2016, 4:36 PM
DannyH added a subscriber: Johan.
kaldari set the point value for this task to 1.Aug 23 2016, 5:09 PM

Change 306216 merged by jenkins-bot:
Switching Swedish Wikipedia to uca-sv-u-kn collation

https://gerrit.wikimedia.org/r/306216

Mentioned in SAL [2016-08-23T18:12:13Z] <thcipriani@tin> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:306216|Switching Swedish Wikipedia to uca-sv-u-kn collation (T142113)]] (duration: 00m 58s)

kaldari closed this task as Resolved.Aug 23 2016, 6:17 PM

Deployed to Swedish Wikipedia:
https://sv.wikipedia.org/wiki/Kategori:Musikgrupper_med_syskon

Seems to be working well so far. updateCollation.php script is still running. Will probably take a few hours to finish.

Johan added a comment.Aug 24 2016, 4:13 PM

No problems reported so far, except for that it doesn't work well with separators. But we knew that.

Johan moved this task from Do soon to Archive on the User-Johan board.Jul 26 2017, 3:21 AM