Page MenuHomePhabricator

Incorrect sorting in categories on Russian-language projects
Closed, ResolvedPublic

Description

In certain categories one article becomes at top in block group for his letters
(for example, Baa Bbc Bcb -> Bbc Baa Bcb )
and the block letters becomes ahead of all the blocks of other letters
([A] Aaa Abc Acb [B] Baa Bbc Bcb [C] ... -> [B] Bbc Baa Bcb [A] Aaa Abc Acb [C] ... )
such wrong sequence is still displayed in the categories

example category 1:
https://ru.wikipedia.org/wiki/%D0%9A%D0%B0%D1%82%D0%B5%D0%B3%D0%BE%D1%80%D0%B8%D1%8F:%D0%A3%D0%BC%D0%B5%D1%80%D1%88%D0%B8%D0%B5_11_%D1%8F%D0%BD%D0%B2%D0%B0%D1%80%D1%8F

57369451b7b321e293603601dfec2ee6.png (606×1 px, 150 KB)

example category 2:
https://ru.wikipedia.org/wiki/%D0%9A%D0%B0%D1%82%D0%B5%D0%B3%D0%BE%D1%80%D0%B8%D1%8F:%D0%A3%D0%BC%D0%B5%D1%80%D1%88%D0%B8%D0%B5_%D0%B2_2015_%D0%B3%D0%BE%D0%B4%D1%83
a5f3b5a08e187ca3682f1d67cb6e52e8.png (660×1 px, 152 KB)

correct order of letters in the alphabet:
https://en.wikipedia.org/wiki/Russian_alphabet

I tried ?action=purge and Null edit for articles and categories - without result
I tried to change the template, which adds a category, to this category writing this category in wikitext and removing the template from the article - without result
Then I tried ?action=purge, minor and Null edit for articles and categories - without result
I turning back template - without result

if delete a category from the wikitext - article disappears from the category
if then write a category in wikitext - аrticle will in normal right place in the category

At first category appears as follows: date added to Wikidat and article template generates the category through the Module (sorting there is not provided, just a [[Category:Died in +Wikidata+ year]]), then it can be seen in the article. Maybe the problem shown itself somewhere along this path.
(article (one of those that puts at the beginning of the category) that I changed and that was displayed incorrectly
https://ru.wikipedia.org/w/index.php?title=%D0%91%D1%83%D0%B7%D0%B0%D0%BD%D1%81%D0%BA%D0%B8,_%D0%95%D0%BD%D1%91
https://www.wikidata.org/w/index.php?title=Q691209
date was added here: https://www.wikidata.org/w/index.php?title=Q691209&diff=187408254&oldid=184639256
)

https://ru.wikipedia.org/wiki/%D0%9A%D0%B0%D1%82%D0%B5%D0%B3%D0%BE%D1%80%D0%B8%D1%8F:%D0%A0%D0%BE%D0%B4%D0%B8%D0%B2%D1%88%D0%B8%D0%B5%D1%81%D1%8F_%D0%B2_%D0%A0%D1%8F%D0%B7%D0%B0%D0%BD%D1%81%D0%BA%D0%BE%D0%B9_%D0%BE%D0%B1%D0%BB%D0%B0%D1%81%D1%82%D0%B8
2 "М" groups

ru.wikipedia.org_screen_capture_2015-02-06_16-22-44.png (2×1 px, 220 KB)

Event Timeline

Sunpriat raised the priority of this task from to Needs Triage.
Sunpriat updated the task description. (Show Details)
Sunpriat added a subscriber: Sunpriat.
matmarex set Security to None.
matmarex added a subscriber: matmarex.
Aklapper added a project: I18n.

Something else reminded me of this issue today and I decided to investigate. I'm sorry this went without a reply for so long. I looked at https://ru.wiktionary.org/wiki/Категория:Цитаты/Соломатина_Т._Ю. (linked by DonRumata above), since it's a nice small category, and because no articles in it have custom defaultsort.

It looks like this at the moment:

pasted_file (1×1 px, 211 KB)

I don't speak Russian, but compared to the expected order https://en.wikipedia.org/wiki/Russian_alphabet, some headings are in the wrong order (for example, Д is near the end, it should be after Г) and some articles under each heading are in the wrong order too (for example, банан should be the first article under Б).

Looking at the data in the 'categorylinks' table for this category:

select page2.page_title, cl_from, cl_to, HEX(cl_sortkey), cl _sortkey_prefix, cl_timestamp, cl_collation, cl_type
from categorylinks, page, page AS page2
where cl_to=page.page_title and page.page_id=974359 and cl_from=page2.page_id
order by cl_sortkey

(Full results: P2732)

page_titlecl_fromHEX(cl_sortkey)cl_timestamp
антенатальный231880260A94C034940AC00A7CFF2733032694FF272F03266601110111002015-01-21 03:49:45
бэби-бум8351772616FF273B0326165A0306462616CE8E010C010C002015-01-23 23:48:48
в смысле9076642618030426BA8EFF272F0326BA7C34010C010C002015-01-24 04:23:41
вноситься889682261894A4BA5AC0FF27330326BAFF2745010D010D002015-01-24 04:15:34
вошь107222618A4FF271F3301080108002015-01-20 14:04:47
гадь288382261A0A24FF273301080108002015-01-21 10:13:49
Гоголь802190261AA41AA47CFF2733010A018F09002015-01-23 23:24:32
карусельный237839266C0AB4CEBA347CFF2733032694FF272F032666010F010F002015-01-21 04:37:57
культурный шок941574266CCE7CFF27330326C0CEB494FF272F0326660304271F0326A46C01120112002015-01-24 02:15:33
мантра192984268E0A94C0B40A010A010A002015-01-27 23:37:20
определённый12521526A4ACB43424347C349494FF272F032666017F9D080111002015-01-20 17:48:23
отож76006526A4C0A43A01080108002015-01-23 22:53:10
оторва81727126A4C0A4B4180A010A010A002015-01-23 23:33:01
перец12649726AC34B434F801090109002015-01-20 17:48:52
перинатальный24452326AC34B45A940AC00A7CFF2733032694FF272F03266601110111002015-01-21 05:25:45
по самое83436526ACA4030426BA0A8EA434010C010C002015-01-23 23:47:46
постнатальный87586826ACA4BAC0940AC00A7CFF2733032694FF272F03266601110111002015-01-24 04:07:19
прикинуться26252126ACB45A6C5A94CEC0FF27330326BAFF2745010F010F002015-01-21 07:16:24
прикроватный24624326ACB45A6CB4A4180AC094FF272F03266601100110002015-01-21 05:37:46
сарафанное радио80236726BA0AB40ADE0A9494A434030426B40A245AA401140114002015-01-24 03:26:33
секция17460426BA346CF85AFF2745010A010A002015-01-27 23:28:45
сиськастый83259226BA5ABAFF273303266C0ABAC0FF272F032666010E010E002015-01-23 23:45:08
суп из семи залуп90206826BACEAC0304265A42030426BA348E5A030426420A7CCEAC01150115002015-01-24 01:06:24
тончайший73067926C0A494FF270B03260A66FF271F03265A66010D010D002015-01-23 22:33:25
тяжеловоз22233026C0FF274503263A347CA418A442010D010D002015-01-27 23:55:34
уделать26554126CE24347C0AC0FF2733010B010B002015-01-21 07:39:58
финансовое обязательство90678226DE5A940A94BAA418A434030426A416FF27450326420AC0347CFF27330326BAC018A4011C011C002015-01-24 04:23:07
фрикативный25174126DEB45A6C0AC05A1894FF272F032666010F010F002015-01-21 06:13:33
яйцо16572745032666F8A401080108002015-01-20 13:52:29
айва780725C0A66180A01080108002015-06-30 18:53:51
акать1645945C0A6C0AC0FF5D3301090109002015-01-20 19:12:04
амниоцентез10078985C0A8E945AA4F83494C03442010F010F002015-03-31 17:25:17
артефакт2031015C0AB4C034DE0A6CC0010C010C002015-01-20 23:38:00
банан91455C160A940A9401090109002015-05-21 07:53:25
бренность1090645C16B4349494A4BAC0FF5D33010D010D002015-01-20 17:12:01
бювет2038775C16FF5D3F035C1834C001090109002016-01-15 09:13:46
в гостях хорошо, а дома лучше7306635C1803045C1AA4BAC0FF5D45035CE003045CE0A4B4A4FF5D1F035CA40307045C0A03045C24A48E0A03045C7CCEFF5D0B1F035C3401210121002015-01-24 02:48:45
Вован9613085C18A4180A940109018F08002015-01-24 02:48:30
вращающийся10371375C18B40AFF5D23035C0AFF5D3F23035C5A66BAFF5D45010F010F002015-06-13 12:39:44
выпиться10549875C18FF5D2F035CAC5AC0FF5D33035CBAFF5D45010C010C002015-09-15 09:19:24
выходящий9892075C18FF5D2F035CE0A424FF5D4523035C5A66010D010D002015-02-04 05:57:32
домработница1830775C24A48EB40A16A4C0945AF80A01100110002015-04-21 18:38:22
ёклмн3963105C346C7C8E9401869D08010A002015-01-23 05:46:33
ёпт7433115C34ACC001869D060108002015-04-28 17:01:00
забить болт5113855C420A165AC0FF5D3303045C16A47CC0010F010F002015-05-04 14:02:00
залепить2558485C420A7C34AC5AC0FF5D33010C010C002015-01-21 06:37:46
интранатальный8758675C5A94C0B40A940AC00A7CFF5D33035C94FF5D2F035C6601120112002015-01-24 00:39:10
манипулятор4340565C8E0A945AACCE7CFF5D45035CC0A4B4010F010F002015-01-23 05:55:59
мотор1490555C8EA4C0A4B401090109002015-01-20 18:24:51
найдёныш5226375C940A66243494FF5D2F1F01829D07010D002015-01-28 02:36:04
нехуй5990005C9434E0CE6601090109002015-01-23 07:22:55
обо1338485CA416A401070107002015-04-23 19:45:17
одногруппник10586795CA42494A41AB4CEACAC945A6C01100110002015-09-28 09:46:54
пойти в жопу127285CACA466C05A03045C1803045C3AA4ACCE01100110002015-01-20 14:05:43
порно2284385CACA4B494A401090109002015-12-11 11:45:28
порядковый номер5217805CACA4B4FF5D45035C246CA418FF5D2F035C6603045C94A48E34B401140114002015-01-23 23:01:05
проработать1645275CACB4A4B40A16A4C00AC0FF5D33010F010F002016-02-16 10:33:43
сверлильный2481135CBA1834B47C5A7CFF5D33035C94FF5D2F035C66010F010F002015-01-21 05:49:46
смерчик10371385CBA8E34B4FF5D0B035C5A6C010B010B002015-06-13 12:42:53
смоленский2489835CBA8EA47C3494BA6C5A66010E010E002015-01-24 07:33:32
спустить2650455CBAACCEBAC05AC0FF5D33010C010C002015-06-02 16:21:42
сюрреалистический2498785CBAFF5D3F035CB4B4340A7C5ABAC05AFF5D0B035C34BA6C5A6601150115002015-07-29 03:04:57
телепаться2653165CC0347C34AC0AC0FF5D33035CBAFF5D45010E010E002015-01-21 07:39:23
токолитик10480115CC0A46CA47C5AC05A6C010D010D002015-07-21 18:43:48
толстеть2653635CC0A47CBAC034C0FF5D33010C010C002015-01-26 21:06:32
труситься9621955CC0B4CEBA5AC0FF5D33035CBAFF5D45010D010D002015-01-24 05:08:30
тяпать2654735CC0FF5D45035CAC0AC0FF5D33010A010A002015-01-21 07:39:42
хабалка1840285CE00A160A7C6C0A010B010B002015-05-03 07:27:41
холотропное дыхание10071045CE0A47CA4C0B4A4AC94A43403045C24FF5D2F035CE00A945A3401170117002015-04-19 11:35:22
цервикальный10263585CF834B4185A6C0A7CFF5D33035C94FF5D2F035C6601100110002015-05-29 11:18:41
цервикальный канал10263315CF834B4185A6C0A7CFF5D33035C94FF5D2F035C6603045C6C0A940A7C01160116002015-05-29 10:54:45
шмон3730875D1F035C8EA49401080108002015-01-23 22:00:34
эксэль9973985D3B035C6CBAFF5D3B035C7CFF5D33010A010A002015-03-10 13:40:41

This is the order in which the articles should be normally sorted, they display differently on the category page because they're grouped there by first letter, and the code doesn't expect inconsistent ordering.

A few things are clear:

  • There are two large groups, articles with sortkey beginning with '2' and those with sortkey beginning with '5'.
  • The articles within each group are correctly ordered (as far as I can tell).
  • The latest cl_timestamp date for the '2' group is 2015-01-27 (I think this is the date where the category was added to the article).

So, something happened near the end of January 2015 that changed the format of the sortkeys generated afterwards, resulting in incorrect ordering of the articles. I don't remember anything related happening then, there's nothing that looks related in deployments history nor SAL, and no one from Operations that I asked could remember anything. It also seems that no projects other than Russian-language ones were affected (and there were quite a few projects using UCA collations at the time), or at least no one else complained.

So, to fix it, we just need to run a script to re-generate all the sortkeys – files T129411 about that. The script can take a long time on large wikis, so this might not be trivial, but probably can be done.

I have no idea what's the cause of this issue, but if anyone has any leads, I'd love to know.

matmarex renamed this task from Incorrect sorting in categories to Incorrect sorting in categories on Russian-language projects.Mar 9 2016, 9:51 PM
matmarex added subscribers: Bawolff, PleaseStand.

I have no idea what's the cause of this issue, but if anyone has any leads, I'd love to know.

Normally the cause would be if someone updated the version of php (or libicu)

Looks fixed to me by T129411.