special firstChar() routine for Korean characters
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	• bzimport
	Mar 16 2005, 5:02 AM

Description

Author: puzzlet

Description:
Since the written Korean language -- hangul -- is syllablic, pages in a category
page are sectioned with their initial syllables other than letters or phonemes.
As a result, almost every page has eventually its own section. Look at the URL,
which is equivalent to the Category:People in the English Wikipedia. In the
Korean category page, many pages have their own sections, such as
Category:Austrian_people, which falls in the "Au" section,
Category:Polish_people, which falls in the "Pol" section, etc. (They can be
recategorized to Category:People_by_nationality of course, but that's not the
point of the discussion.)

Every hangul letter can be divided to consonants and vowels, and it could be the
better index scheme for category pages if we section by the initial consonants
of initial letters of the pages:

articles starting with from 가(U+AC00) to 낗(U+B097) under the section with a

title ㄱ(U+1100),

from 나(U+B098) to 닣(U+B2E3) under ㄴ(U+1102),
from 다(U+B2E4) to 띻(U+B77B) under ㄷ(U+1103),
from 라(U+B77C) to 맇(U+B9C7) under ㄹ(U+1105),
from 마(U+B9C8) to 밓(U+BC13) under ㅁ(U+1106),
from 바(U+BC14) to 삫(U+C0AB) under ㅂ(U+1107),
from 사(U+C0AC) to 앃(U+C543) under ㅅ(U+1109),
from 아(U+C544) to 잏(U+C78F) under ㅇ(U+110B),
from 자(U+C790) to 찧(U+CC27) under ㅈ(U+110C),
from 차(U+CC28) to 칳(U+CE73) under ㅊ(U+110E),
from 카(U+CE74) to 킿(U+D0BF) under ㅋ(U+110F),
from 타(U+D0C0) to 팋(U+D30B) under ㅌ(U+1110),
from 파(U+D30C) to 핗(U+D557) under ㅍ(U+1111),
and from 하(U+D558) to 힣(U+D7A3) under ㅎ(U+1112).

Version: unspecified
Severity: enhancement
URL: http://ko.wikipedia.org/wiki/Category:%EC%9D%B8%EB%AC%BC

Details

Reference: bz1701

Title	Reference	Author	Source Branch	Dest Branch
Upgrade Superset to version 4.0.2	repos/data-engineering/superset!36	brouberol	T370152-superset-4.0.2	main
Updated commons_category_allow_list.tsv to add...	repos/data-engineering/airflow-dags!761	fromeowmf	fromeowmf-main-patch-39887	main
Add new flavor for 'integration' project	repos/cloud/cloud-vps/tofu-infra!15	sstefanova	slavina/add-flavor	main
ingress-nginx: scale up deployment	repos/cloud/toolforge/toolforge-deploy!422	aborrero	arturo-334-ingress-nginx-scale	main
update submodules for 2024-07-16 release (includes upstream 2024.19)	repos/phabricator/deployment!47	brennen	work/2024-07-16-release	wmf/stable

Customize query in GitLab

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Declined		None	T5950 case sensitivity issues (tracking)
		Resolved		None	T3701 special firstChar() routine for Korean characters

Event Timeline

• bzimport raised the priority of this task from to High.Nov 21 2014, 8:14 PM

• bzimport added a project: MediaWiki-Categories.

• bzimport set Reference to bz1701.

• bzimport added a subscriber: Unknown Object (MLST).

• bzimport created this task.Mar 16 2005, 5:02 AM

avarab wrote:

A duplicate of bug 1984.

*** This bug has been marked as a duplicate of 1984 ***

puzzlet wrote:

Patch for LanguageUtf8.php

Attached:

LanguageUtf8.php.patch1 KBDownload

puzzlet wrote:

Changes in LanguageKo.php work fine in Korean Wikipedia, but multilingual
projects like Meta-wiki Wikisource need to be updated too. I attached the patch
file, which only modifies firstChar() to specially treat the Hangul Syllables
Area(U+AC00 ~ U+D7A3), but for any other characters it will do as what it has
been doing. But I'm not sure which file is the appropriate to be patched -
Language.php or LanguageUtf8.php. Take this for a test -
http://wikisource.org/wiki/Category:%ED%95%9C%EA%B5%AD%EC%96%B4 - which should
be not more than 10 sections after commit.

puzzlet wrote:

It's now OK for Korean Wikisource (
http://ko.wikisource.org/wiki/%EB%B6%84%EB%A5%98:%EC%8B%9C%EC%A1%B0 ) but
multilingual wiki like Meta-wiki still has this issue (
http://meta.wikimedia.org/wiki/Category:KO ).

My point is that this feature should be applied universally if it matters with
the pagename with Korean characters.

anon.hui wrote:

I second to this, this firstChar() of ko should apply to all wiki language, especially, on multilingual wiki.
Not just on ko wiki.

kjoonlee wrote:

Another vote for support here.

Done in r35055. Also did a tiny bit of cleanup to use utf8ToCodepoint() func instead of the manual UTF-8 decomp code.

(Could just use raw characters here instead of the hex positions, should one desire, but this isn't a performance-critical code path.)

	F1869: LanguageUtf8.php.patch
	Nov 21 2014, 8:14 PM

special firstChar() routine for Korean charactersClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

special firstChar() routine for Korean characters
Closed, ResolvedPublic
Actions

Related Objects
Search...