Page MenuHomePhabricator

Convert more wikis to numerical sorting
Closed, ResolvedPublic3 Story Points

Description

Collecting requests from wikis asking for numerical sorting. We'll save up a batch, and then do them at the same time.

Before we start these -- let Danny & Johan know, so they can notify the wikis.

Bengali WP
Consensus discussion: https://bn.wikipedia.org/wiki/%E0%A6%89%E0%A6%87%E0%A6%95%E0%A6%BF%E0%A6%AA%E0%A6%BF%E0%A6%A1%E0%A6%BF%E0%A6%AF%E0%A6%BC%E0%A6%BE:%E0%A6%86%E0%A6%B2%E0%A7%8B%E0%A6%9A%E0%A6%A8%E0%A6%BE%E0%A6%B8%E0%A6%AD%E0%A6%BE#.E0.A6.AC.E0.A6.BF.E0.A6.B7.E0.A6.AF.E0.A6.BC.E0.A6.B6.E0.A7.8D.E0.A6.B0.E0.A7.87.E0.A6.A3.E0.A7.80.E0.A6.A4.E0.A7.87_.E0.A6.B8.E0.A6.82.E0.A6.96.E0.A7.8D.E0.A6.AF.E0.A6.BE.E0.A6.B0_.E0.A6.95.E0.A7.8D.E0.A6.B0.E0.A6.AE_.E0.A6.A0.E0.A6.BF.E0.A6.95_.E0.A6.B0.E0.A6.BE.E0.A6.96.E0.A6.BE.E0.A6.B0_.E0.A6.AC.E0.A7.8D.E0.A6.AF.E0.A6.AC.E0.A6.B8.E0.A7.8D.E0.A6.A5.E0.A6.BE
Talk page request: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Enable_numerical_sorting_on_bn.40wikipedia_and_bn.40wikisource

Bengali Wikisource
Consensus discussion: https://bn.wikisource.org/wiki/%E0%A6%89%E0%A6%87%E0%A6%95%E0%A6%BF%E0%A6%B8%E0%A6%82%E0%A6%95%E0%A6%B2%E0%A6%A8:%E0%A6%B8%E0%A7%8D%E0%A6%95%E0%A7%8D%E0%A6%B0%E0%A6%BF%E0%A6%AA%E0%A7%8D%E0%A6%9F%E0%A6%B0%E0%A6%BF%E0%A6%AF%E0%A6%BC%E0%A6%BE%E0%A6%AE#.E0.A6.AC.E0.A6.BF.E0.A6.B7.E0.A6.AF.E0.A6.BC.E0.A6.B6.E0.A7.8D.E0.A6.B0.E0.A7.87.E0.A6.A3.E0.A7.80.E0.A6.A4.E0.A7.87_.E0.A6.B8.E0.A6.82.E0.A6.96.E0.A7.8D.E0.A6.AF.E0.A6.BE.E0.A6.B0_.E0.A6.95.E0.A7.8D.E0.A6.B0.E0.A6.AE_.E0.A6.A0.E0.A6.BF.E0.A6.95_.E0.A6.B0.E0.A6.BE.E0.A6.96.E0.A6.BE.E0.A6.B0_.E0.A6.AC.E0.A7.8D.E0.A6.AF.E0.A6.AC.E0.A6.B8.E0.A7.8D.E0.A6.A5.E0.A6.BE
Talk page request: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Enable_numerical_sorting_on_bn.40wikipedia_and_bn.40wikisource

Czech WP
Consensus discussion: https://cs.wikipedia.org/wiki/Wikipedie:Pod_l%C3%ADpou#.C5.98azen.C3.AD_.C4.8Dl.C3.A1nk.C5.AF_v_kategori.C3.ADch_podle_.C4.8D.C3.ADsel
Talk page request: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Natural_number_sorting_on_cswiki

French WP
Consensus discussion: https://fr.wikipedia.org/wiki/Discussion_Projet:Cat%C3%A9gories#Mini-sondage_:_tri_automatique_des_nombres_dans_les_cat.C3.A9gories

Hebrew WP
Consensus discussion: https://he.wikipedia.org/wiki/%D7%95%D7%99%D7%A7%D7%99%D7%A4%D7%93%D7%99%D7%94:%D7%9E%D7%96%D7%A0%D7%95%D7%9F#.D7.A9.D7.99.D7.A0.D7.95.D7.99_.D7.A9.D7.99.D7.98.D7.AA_.D7.94.D7.9E.D7.99.D7.95.D7.9F_.D7.A7.D7.98.D7.92.D7.95.D7.A8.D7.99.D7.95.D7.AA_.D7.A2.D7.9D_.D7.9E.D7.A1.D7.A4.D7.A8.D7.99.D7.9D_.D7.91.D7.A9.D7.9E.D7.95.D7.AA
Request on talk page: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Enable_numerical_sorting_on_hewiki

Hungarian WP
Consensus discussion: https://hu.wikipedia.org/wiki/Wikip%C3%A9dia:Kocsmafal_(javaslatok)#Kateg.C3.B3ri.C3.A1k_numerikus_rendez.C3.A9se
Request on talk page: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Enable_numerical_sorting_in_hu.40wikipedia

Italian WP
Consensus discussion: https://it.wikipedia.org/w/index.php?title=Wikipedia:Bar/Discussioni/Ordine_alfabetico_di_default&diff=0&oldid=83416864
Request on talk page: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Numerical_sorting

Norwegian (Bokmål) WP
Consensus discussion: https://no.wikipedia.org/wiki/Wikipedia:Tinget#Numerisk_sortering_i_kategorier
Request on talk page: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Enable_numeric_sorting_at_no.wikipedia

Polish WP
Consensus discussion: https://pl.wikipedia.org/wiki/Wikipedia:Kawiarenka/Og%C3%B3lne#Zmiana_konfiguracji_.E2.80.93_w.C5.82.C4.85czenie_poprawnego_sortowania_numerycznego_artyku.C5.82.C3.B3w_na_stronach_kategorii
Request on talk page: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Numerical_sorting_on_pl.wp

Russian WP
Consensus discussion: https://ru.wikipedia.org/wiki/%D0%92%D0%B8%D0%BA%D0%B8%D0%BF%D0%B5%D0%B4%D0%B8%D1%8F:%D0%A4%D0%BE%D1%80%D1%83%D0%BC/%D0%9F%D1%80%D0%B5%D0%B4%D0%BB%D0%BE%D0%B6%D0%B5%D0%BD%D0%B8%D1%8F#.D0.A1.D0.BE.D1.80.D1.82.D0.B8.D1.80.D0.BE.D0.B2.D0.BA.D0.B0_.D1.87.D0.B8.D1.81.D0.B5.D0.BB_.D0.B2_.D0.BA.D0.B0.D1.82.D0.B5.D0.B3.D0.BE.D1.80.D0.B8.D1.8F.D1.85
Request on talk page: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Enable_numerical_sorting_in_ru.40wikipedia

Vietnamese WP
Consensus discussion: https://vi.wikipedia.org/wiki/Wikipedia:Th%E1%BA%A3o_lu%E1%BA%ADn/S%E1%BA%AFp_x%E1%BA%BFp_c%C3%A1c_th%E1%BB%83_lo%E1%BA%A1i_theo_gi%C3%A1_tr%E1%BB%8B_s%E1%BB%91_%C4%91%E1%BA%BFm_thay_v%C3%AC_theo_t%E1%BB%ABng_ch%E1%BB%AF_s%E1%BB%91_%C4%91%C6%A1n_thu%E1%BA%A7n
Request on talk page: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Enabling_numerical_sorting_on_vi.wikipedia

Details

Related Gerrit Patches:

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 26 2016, 5:22 PM
Elitre added a subscriber: Elitre.Sep 26 2016, 5:32 PM

You don't anticipate anything possibly going wrong in the transition, right?

During the time that the script is running, the sorting on some category pages gets weird -- newly-updated pages get the new sorting while older pages are still in the old sorting, and using "next page" doesn't work as you'd expected.

But those problems get corrected as the script reaches that category, and when the script is done, everything works the way that it's supposed to.

The script took about four hours to run when converted Swedish Wikipedia. I'd expect it to be shorter than that for Italian, because there are fewer pages.

English WP took six days, but that's the biggest by far, and it turns out the process is exponential rather than arithmetic. :)

DannyH updated the task description. (Show Details)Sep 26 2016, 7:40 PM
Beta16 added a subscriber: Beta16.Sep 27 2016, 8:24 AM
DannyH updated the task description. (Show Details)Sep 29 2016, 5:26 PM
Samat added a subscriber: Samat.Oct 1 2016, 7:56 AM
jhsoby updated the task description. (Show Details)Oct 2 2016, 11:04 PM
jhsoby added a subscriber: jhsoby.Oct 3 2016, 6:29 AM
DannyH updated the task description. (Show Details)Oct 3 2016, 8:00 PM
DannyH moved this task from Estimated to To be estimated/discussed on the Community-Tech board.
DannyH moved this task from To be estimated/discussed to Archive on the Community-Tech board.
jeblad added a subscriber: jeblad.Oct 4 2016, 4:59 AM

Could it be written a proper page describing how this subsystem works and what the expected impact will be? It seems like the implemented system is somewhat different from the announced system.

DannyH updated the task description. (Show Details)Oct 4 2016, 10:17 PM
DannyH set the point value for this task to 3.
DannyH moved this task from To be estimated/discussed to Estimated on the Community-Tech board.
DannyH updated the task description. (Show Details)Oct 6 2016, 9:10 PM
Arbnos added a subscriber: Arbnos.Oct 8 2016, 12:02 PM
kaldari added a subscriber: kaldari.EditedOct 13 2016, 11:57 PM

Could it be written a proper page describing how this subsystem works and what the expected impact will be?

@jeblad: Documentation can be found at https://www.mediawiki.org/wiki/Manual:$wgCategoryCollation. In most of these cases (except Hebrew and Norwegian), the wiki is switching from uca-<langcode> to uca-<langcode>-u-kn, in which case the only difference is the addition of the numeric sorting feature. For Hebrew and Norwegian, they are starting from uppercase collation. Norwegian can either upgrade to uca-no-u-kn (which is Unicode Collation Algorithm tailored for Norwegian + numeric sorting) or numeric (which is identical to what they have now, but with numeric sorting). Hebrew isn't supported by our IcuCollation class, so they can only upgrade to numeric (unless someone modifies IcuCollation to support Hebrew in the very near future).

jeblad added a comment.EditedOct 15 2016, 7:47 PM

How will these three blocks be sorted

  • foo 123 456 bar
  • foo 456 123 bar
  • foo 789 bar
  • foo 789

  • foo 123.456 bar
  • foo 456.123 bar
  • foo 789 bar
  • foo 789

  • foo 123,456 bar
  • foo 456,123 bar
  • foo 789 bar
  • foo 789

@jeblad: Numeric sorting only works for unbroken sequences of digits. Digits separated by commas, periods, or spaces are treated as separate numbers (and thus may still require DEFAULTSORT keys).

DannyH updated the task description. (Show Details)Oct 17 2016, 5:15 PM
DannyH updated the task description. (Show Details)Oct 17 2016, 9:21 PM
kaldari claimed this task.Oct 17 2016, 10:09 PM
kaldari moved this task from Ready to In Development on the Community-Tech-Sprint board.

Change 316486 had a related patch set uploaded (by Kaldari):
Switching 10 wikis to numeric category collation per T146675

https://gerrit.wikimedia.org/r/316486

Change 316486 merged by jenkins-bot:
Switching 10 wikis to numeric category collation per T146675

https://gerrit.wikimedia.org/r/316486

Mentioned in SAL (#wikimedia-operations) [2016-10-19T23:20:11Z] <dereckson@mira> Synchronized wmf-config/InitialiseSettings.php: Switching 10 more wikis to numeric category collation (T146675) (duration: 00m 59s)

kaldari added a comment.EditedOct 20 2016, 4:23 AM

Well, there is a problem. 99<019<101

Well, there is a problem. 99<019<101

Where are you seeing this?

Thanks @IKhitron! I've filed a bug for that: T148774. I think I know exactly how to fix this, so it shouldn't take long.

Btw, how do you sort

  • abc 20 5
  • abc 5 20
  • abc 5 80

? Can I assume that it wil be sorted by first number and then by second?

@IKhitron: Yes. I'm not 100% sure there are no bugs with complicated sequences of numbers, but I just tested it locally and got the following sort order:

  • Abc 5 3
  • Abc 5 20
  • Abc 5 80
  • Abc 20 5

Good, thanks.

@IKhitron: The patch to fix leading zeros has been merged. It should get deployed to he.wiki next Thursday. Then we can rebuild the sortkeys on Thursday evening or Friday.

Thanks, and it will be deployed next Wednesday, @kaldari.

All the languages besides Norwegian are finished.

Wesalius removed a subscriber: Wesalius.Oct 24 2016, 5:07 PM

Change 317652 had a related patch set uploaded (by Kaldari):
Switch Norwegian Wikipedia to uca-no-u-kn category collation

https://gerrit.wikimedia.org/r/317652

Change 317652 merged by jenkins-bot:
Switch Norwegian Wikipedia to uca-no-u-kn category collation

https://gerrit.wikimedia.org/r/317652

kaldari closed this task as Resolved.Oct 25 2016, 7:33 AM

Norwegian is finished.

For Hebrew and Norwegian, they are starting from uppercase collation. Norwegian can either upgrade to uca-no-u-kn (which is Unicode Collation Algorithm tailored for Norwegian + numeric sorting) or numeric (which is identical to what they have now, but with numeric sorting). Hebrew isn't supported by our IcuCollation class, so they can only upgrade to numeric (unless someone modifies IcuCollation to support Hebrew in the very near future)

I added he to IcuCollation in https://gerrit.wikimedia.org/r/318674

@kaldari, or maybe @DannyH? When can we expect the rerun? Thank you.

@IKhitron: Rerunning now. Should be done in a few hours. Sorry for the delay.

Thank you very much!

DannyH moved this task from Estimated to Archive on the Community-Tech board.Nov 8 2016, 11:28 PM