Page MenuHomePhabricator

Slovene projects: Alphabetical order in categories (collation)
Closed, ResolvedPublic

Description

In categories of Slovene projects, letters Č, Š and Ž are all listed at the end of the alphabet (example). That should be changed in accordance with Slovene alphabet.

Question 1: What happens with other letters of Latin Extended?

Question 2: Is it possible to arrange the same Slovene alphabetical order also in Commons - when the system detects Slovene user?

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 7 2018, 6:55 PM
Janezdrilc renamed this task from Slovene projects: Alphabetical order in the categories (collation) to Slovene projects: Alphabetical order in categories (collation).Nov 7 2018, 6:56 PM
matmarex added a subscriber: matmarex.

In terms of configuration, this would involve setting https://www.mediawiki.org/wiki/Manual:$wgCategoryCollation to uca-sl.

Question 1: What happens with other letters of Latin Extended?

They are sorted like the base letter, e.g. "É" would be sorted like "E". You can see this on other Wikipedias already using language-specific collations, e.g.:

Question 2: Is it possible to arrange the same Slovene alphabetical order also in Commons - when the system detects Slovene user?

Unfortunately, no :( Only one category collation can be in use at once. See T37378: Support multiple collations at the same time.

Janezdrilc added a comment.EditedNov 10 2018, 8:40 PM

About Polish examples:

So, letters as "Ś" and "Ż" that are common in Polish alphabet get their own headings, and letters that are not common in Polish alphabet as "É" and "Ú" (from Spanish alphabet) don't get their own headings. Well, that doesn't seem a logic solution to me.

I think every different letter, no matter of language or alphabet, should get it's own heading - in every Wikipedia. That would be probably the easiest way to search among lists.

It is impossible to decide what is a "different letter" unless you're talking about just one language. For example, "Ó" (O with acute accent) is a separate letter from "O" in Polish, but it is the same letter as "O" in Spanish. Not to even mention that e.g. Hungarian has digraphs like "Ny" and "Sz" that are considered separate letters (https://hu.wikipedia.org/wiki/Kategória:Magyarország_városai).

Even if you defined your letters somehow, there is no consistent ordering of them between languages. For example, "Ä" is ordered after "A" in German, but after "Z" in Swedish.

The "international" ordering defined in Unicode (uca-default) orders all accented Latin letters together with the base letter. The default ordering used by MediaWiki (uppercase) makes all accented Latin letters separate, ordering them by their Unicode codepoint (so they all appear after "Z" in some arbitrary order). If neither of these works for you, then the language-specific orderings are the only other option.

Janezdrilc added a comment.EditedNov 12 2018, 11:32 PM

About "language-specific ordering": Is it possible to arrange for Slovene projects that every single letter of Latin and Latin Extended gets it's own heading?

So, there would be the main order of Slovene alphabet from "A" to "Ž", among them there would come special carracters like in MediaWiki edit toolbar:

  • After "A" there comes Ä, Å and so on ...
  • After "C" first there comes Slovene "Č" and then Ć, Ĉ, Ç, ...
  • After "S" first there comes Slovene "Š" and then Ś, Ŝ, Ş, ...
  • After "Z" first there comes Slovene "Ž" and then Ź, Ż, Ẑ ...

Technically it would look like Slovene alphabet consists of 70 carracters (for example) and not only 25 (as officially). I would like to try this "test" if possible and then give you feedback after a while.

Theoretically it should be possible, but we do not have such a collation implemented, and I will not have time to work on it.

Ok, no problem. It's just an idea that I believe it would be most practical for common usage.

Can you set then the collation to uca-sl?

jhsoby added a subscriber: jhsoby.Feb 8 2019, 1:48 PM

@matmarex (just for reminder). Uca-sl is also just fine in my opinion. You can set it in all 6 Slovene projects and "clear sailing ahead".

Sorry, I missed your December message (I get a lot of notifications from Phabricator). I don't have to be the person to change the settings, anyone can submit a patch, although I'd be happy to do it in this case :)

@Janezdrilc While this is probably an obvious improvement, I'd rather follow the formal process per https://meta.wikimedia.org/wiki/Requesting_wiki_configuration_changes. Can you post a message on each affected wiki (on a village pump page etc.) and ask the editors to confirm that there is consensus for this change?

Note that "numeric" collation is also available (https://www.mediawiki.org/wiki/Manual:$wgCategoryCollation#Numeric_sorting – numbers in article titles can be ordered numerically rather than alphabetically, e.g. 965 before 1025) and I suggest we enable this as well. So we'd actually use uca-sl-u-kn

Note also that for Wiktionary in particular, it might be preferable to use the language-neutral collation uca-default, since many pages might have names in a different language. So far various Wiktionary communities have inconsistently decided to use either the language-neutral or the language-specific collation (for example pl.wikt has the language-neutral one T48081, but sr.wikt has the language-specific one T115806).

Janezdrilc added a comment.EditedJun 13 2019, 2:49 PM

I have opened a discussion on sl.wiki to get a community consensus for all Slovene projects. So I left notices on other sister projects and redirected their users to Wikipedia to have whole discussion in one place. Project notices on s, q, b, wikt, v.

@matmarex, do "Ć" and "Đ" get its own headings after "C" and "D" in uca-sl (similar like in uca-hr)?

@Janezdrilc No, only "Č", "Š", "Ž" get their own headings (this is defined here).

However, "Ć" and "Đ" are ordered separately, but without headings (this is defined here).

I have not checked it before, it seems really weird to me that this is inconsistent… Is it incorrect?


This means that:

  • under the heading for "Č", there will be all articles starting with "Č", then all articles starting with "Ć"
  • under the heading for "D", there will be all articles starting with "D" (and other diacritical variants like "Ď", mixed together), then all articles starting with "Đ")

Here's how it looks like in practice, note especially the "C", "Č" and "D" sections:

After 1 month of community disscusion being open we've got 6 supporting votes:

  • 4 votes for the uca-sl-u-kn setting
  • 2 votes for any of Slovene settings

There has been no opposing votes. Ok, I believe the time is ready now to set all 6 Slovene projects to uca-sl-u-kn.

Change 524605 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[operations/mediawiki-config@master] Set wgCategoryCollation to 'uca-sl-u-kn' on Slovene projects (sl)

https://gerrit.wikimedia.org/r/524605

Sorry, it's not done yet, I forgot…

I tried rescheduling it for the evening: https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190722T2300 but it seems no one is available now to deploy it.

I'll try again tomorrow.

Sorry, it's not done yet, I forgot…
I tried rescheduling it for the evening: https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190722T2300 but it seems no one is available now to deploy it.
I'll try again tomorrow.

Just appeared to SWAT, the second you quit :-). This should be done within few minutes. Feel free to rejoin if you want :-).

Change 524605 merged by jenkins-bot:
[operations/mediawiki-config@master] Set wgCategoryCollation to 'uca-sl-u-kn' on Slovene projects (sl)

https://gerrit.wikimedia.org/r/524605

Mentioned in SAL (#wikimedia-operations) [2019-07-22T23:27:54Z] <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: Set wgCategoryCollation to uca-sl-u-kn on Slovene projects (sl) (T208984) (duration: 00m 47s)

Mentioned in SAL (#wikimedia-operations) [2019-07-22T23:29:37Z] <Urbanecm> Run mwscript updateCollation.php --wiki=slwiki --previous-collation=uppercase (T208984)

Mentioned in SAL (#wikimedia-operations) [2019-07-22T23:34:17Z] <Urbanecm> Run mwscript updateCollation.php --wiki=slwikibooks --previous-collation=uppercase (T208984)

Mentioned in SAL (#wikimedia-operations) [2019-07-22T23:34:44Z] <Urbanecm> Run mwscript updateCollation.php --wiki=slwikiquote --previous-collation=uppercase (T208984)

Mentioned in SAL (#wikimedia-operations) [2019-07-22T23:37:43Z] <Urbanecm> Run mwscript updateCollation.php --wiki=slwikisource --previous-collation=uppercase (T208984)

Mentioned in SAL (#wikimedia-operations) [2019-07-22T23:39:08Z] <Urbanecm> Run mwscript updateCollation.php --wiki=slwikiversity --previous-collation=uppercase (T208984)

Mentioned in SAL (#wikimedia-operations) [2019-07-22T23:39:29Z] <Urbanecm> Run mwscript updateCollation.php --wiki=slwiktionary --previous-collation=uppercase (T208984)

Mentioned in SAL (#wikimedia-operations) [2019-07-22T23:42:34Z] <Urbanecm> All updateCollation.php runs completed, except the one for slwiki (T208984)

Mentioned in SAL (#wikimedia-operations) [2019-07-23T00:06:56Z] <Urbanecm> slwiki updateCollection.php completed (T208984)

Should be all done! @matmarex, please review&close.

matmarex closed this task as Resolved.Jul 23 2019, 10:42 AM
matmarex claimed this task.

Oh cool, thank you! I was just about to schedule it for this morning.

The example category https://sl.wikipedia.org/wiki/Kategorija:Seznami_osebnosti_po_občinah_v_Sloveniji appears to be in the right order, as far as I can tell.

The setting seems to be implemented successfully. Still there's just one thing I would like to be sure about:

  • Under D heading in Kategorija:Mesta na Hrvaškem (towns in Croatia) there are Dugo Selo, then Đakovo and Đurđevac as the last one. Seems ok.
  • However, under L heading in Kategorija:Mesta na Poljskem (towns in Poland) there are Łobez, then Lodž, Łowicz and Lublin as the last one.

It means that Đ is treated as an "independent" letter and all Đ entries are listed behind D entries. But Ł is ignored as an "independent" letter and it's treated like an ordinary L, so all entries are mixed up.

@Janezdrilc Is this incorrect behavior in Slovene? Can you describe how it should behave instead? I genuinely don't know. (Should there be a separate heading for Đ?)

As I see in other categories, the "mixed up system" is to be the right one:

However in your testing category image there are D and Ď that are mixed up, but after them there comes Đ that is separately sorted at the end - still under D heading.

(As Đ is not a Slovene letter it's correct it doesn't get it's own heading.)

Sorry, but I don't understand if you're just documenting this behavior, or saying that it's wrong and should be changed?

It's not wrong - so to speak, it's just not unified. No, I have no intention to bother you any more with this detail. Let the status stay closed and resolved.

I thank you for all your effort and kindness so far. All the best in future projects.