Page MenuHomePhabricator

Install Josa extension parser function on all Korean language wikis
Closed, ResolvedPublic

Description

In Korean, the particle has different form according to if a before letter has trailing consonant. (For example, 를 (reul) is used only after a word ending in a vowel. If the preceding word ends in a consonant, 을 (eul) is used instead. For further information, see https://en.wikipedia.org/wiki/Korean_postpositions) To solve this problem, we need to install Josa extension (extension by @devunt).

Currently, we are using Lua script (Module:Hangul) in Korean Wikipedia now, but lua script is too complex and heavy. (and also user-unfriendly and wiki-specific).

So I request to install parser function extension Josa for all Korean wiki farms (including kowiki, kowikinews, kowikibooks, kowikiquote, kowikisource, kowikiversity, kowiktionary).

Extension Link: https://www.mediawiki.org/wiki/Extension:Josa
Local community consensus: https://ko.wikipedia.org/wiki/%EC%9C%84%ED%82%A4%EB%B0%B1%EA%B3%BC:%EC%82%AC%EB%9E%91%EB%B0%A9_(%EA%B8%B0%EC%88%A0)/2014%EB%85%84_6%EC%9B%94#.EC.A1.B0.EC.82.AC_.ED.99.95.EC.9E.A5.EA.B8.B0.EB.8A.A5_.EB.8F.84.EC.9E.85

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

kjoonlee wrote:

This is a case of morphophonology.

It's as if you had to type "a/an 'insert noun here'" all the time because you can never know beforehand whether the noun will start with a vowel or a consonant.

So {{#a or an:noun|a|an}} is what Ficell is proposing, I guess.

kjoonlee wrote:

According to [[w:en:Korean language#Morphophonemics]], we will need to test for an additional case.

  • Preceding syllable ends with a consonant
  • Preceding syllable ends with a rieul consonant
  • Preceding syllable ends with a vowel (no consonant)

kjoonlee wrote:

Oops, that's [[Korean language#Morphophonemics]].

IMHO it would be nicer if {{#hangul:AB|CD|EF|GH}} returned ABCD, ABEF or ABGH.

camway wrote:

I discussed this with Kyungjoon Lee simply at the Korean Wikipedia's user talk page. And I suggest following:

(Cf [[Korean language#Morphophonemics]])

(in unicode)
Hangul: U+AC00 ~ U+D7A3
Hangul which ends with vowel: U+AC00 + 28(0x1C)*n (U+AC00, U+AC1C, U+AC38, U+AC54, ..., U+D76C, U+D788)
Hangul which ends with rieul: U+AC08 + 28(0x1C)*n (U+AC08, U+AC24, U+AC40, U+AC5C, ..., U+D774, U+D790)

{{#hanp:AB|CD}} (hanp is abbreviation of hangul particle)

  • When CD is '로'(ro) or '으로'(euro)
    • if a last word of AB ends with consonant(jongseong) except rieul, returned 'AB으로'(ABeuro)
    • if a last word of AB ends with vowel or rieul, returned 'AB로'(ABro)
    • if a last word of AB is not hangul, returned 'AB로'(ABro)
  • When CD is '을'(eul), '이'(i), '와'(wa), '은'(eun) or '를'(reul), '가'(ga), '과'(gwa), '는'(neun)
    • if a last word of AB ends with consonant, returned 'AB을'(ABeul), 'AB이'(ABi), 'AB와'(ABwa), 'AB은'(ABeun)
    • if a last word of AB ends with vowel, returned 'AB를'(ABreul), 'AB가'(ABga), 'AB과'(ABgwa), 'AB는'(ABneun)
    • if a last word of AB is not hangul, returned 'AB를'(ABreul), 'AB가'(ABga), 'AB과'(ABgwa), 'AB는'(ABneun)

kjoonlee wrote:

Yeah, this is how Korean LaTeX macros handle "automatic particle handling" as well.

I think Ficell has wa/gwa switched; the Wikipedia table has the correct choices.

Isn't this something that could (should?) be added to language/classes/LanguageKo.php?

CC-ing Niklas in.
Domain: MediaWiki extensions/ParserFunctions -> MediaWiki/i18n

Could use grammar functionality here, with syntax something like {{GRAMMAR:hanp|AB,CD,EF,GH}} or {{GRAMMAR:hanp:CD,EF,GH|AB}}.

camway wrote:

(In reply to comment #7)

Isn't this something that could (should?) be added to
language/classes/LanguageKo.php?

CC-ing Niklas in.
Domain: MediaWiki extensions/ParserFunctions -> MediaWiki/i18n

Yes, this is. I think so.

(In reply to comment #8)

Could use grammar functionality here, with syntax something like
{{GRAMMAR:hanp|AB,CD,EF,GH}} or {{GRAMMAR:hanp:CD,EF,GH|AB}}.

It's also good ideas, but I think {{#hanp:}} is better to use.

If not grammar, would this new tag be in MediaWiki proper, piggyback an existing extension or be in a new extension?

camway wrote:

(In reply to comment #10)

If not grammar, would this new tag be in MediaWiki proper, piggyback an
existing extension or be in a new extension?

A new extension seems better, although I don't know detail of MediaWiki software.

I've committed an extension that should work like described in comment #5 and #c6 as r41088. It should be easy to review it because it is very small. It might be a good idea to make a new bug request specifically for enabling that extension on Korean projects.

camway wrote:

Patch for Hanp.body.php by Ficell

Thanks for your working, Niklas Laxström. Unfortunately I found some problem. If $word contains signs, it doesn't work well. If we want know whether '[[A]]' + 'eul' is correct or not, we can't get result with current #HANP function, because $word ends with ']' sign that we don't read. To solve this problem, I suggest adding new parameter named "output". I made patch for hanp.body.php. Please consider this.

Attached:

(In reply to comment #13)

Created an attachment (id=5358) [details]
Patch for Hanp.body.php by Ficell

Wow, that was an extremely crappy patch. I had to merge that manually, line by line. Please create a proper patch next time.

Applied in r42700. How does it work now?

camway wrote:

Sorry. I didn't know how to make proper diff file; it now works well. Thanks.

Changed topic and added keywords to request installation of this extension. Should this be installed for all Korean Wikimedia projects?

camway wrote:

Yes. Please install the extension.

kjoonlee wrote:

Hang on, please.

Has this extension been tested anywhere? Would it be OK to put it on a "production" server?

(In reply to comment #18)

Has this extension been tested anywhere? Would it be OK to put it on a
"production" server?

Well, that is why it had a need-review keyword. *You* can be a reviewer, but a Wikimedia developer will also audit it before it will ever go live.

camway wrote:

(In reply to comment #18)

I tested my personal wiki. It works well. Actually I found some problems when using in system message, but it isn't the problem of the function itself. It seems no problem so far.

(In reply to comment #20)

(In reply to comment #18)

It works well. Actually I found some problems when
using in system message, but it isn't the problem of the function itself.

Please provide details so Niklas can assess if it can be fixed.

camway wrote:

(In reply to comment #21)

Sorry for late. I was busy in real life. I'll post it in Betawiki ASAP.

camway wrote:

Including this feature among MediaWiki core seems better. If this feature used in default MediaWiki system message, Korean translation will be more precise.

Also refer http://translatewiki.net/w/i.php?title=Support&oldid=926697#Parameter_on_log_message

(In reply to comment #23)

Including this feature among MediaWiki core seems better. If this feature used
in default MediaWiki system message, Korean translation will be more precise.

Also refer
http://translatewiki.net/w/i.php?title=Support&oldid=926697#Parameter_on_log_message

Well, that is interesting. In comment 11 you stated the opposite. What's it gonna be and why exactly?

camway wrote:

(In reply to comment #24)

I didn't know the difference at that time. Sorry for that.

mike.lifeguard+bugs wrote:

Removed shell keyword since there's nothing to do on shell.

Removed need-review keyword since this has been applied in SVN already and/or should be implemented as a localization in betawiki.

I don't even think there is anything left in this bug to do. If there is, please point it out, otherwise it will get closed as FIXED.

{{#HANP:}} is not in core. It is currently an extension.

Assigning to myself for review.

(I can't imagine what betawiki would have to do with this... an extension couldn't be used for core localizations since it wouldn't be available in default installations.)

camway wrote:

(In reply to comment #28)

I meant this should be function in core, like {{plural:}}, not extension.

*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*

If there is interest, I can easily port this to core. Please let me know that you need this.

It's been dragging for 2.5 years now. Whatever leads to an acceptable resolution, I'd say.

camway wrote:

We need the function like this way while translating MediaWiki messages. When English words translated into Korean, the latest alphabet (in Korean) would be consonant or vowel. It isn't distinguished in English, but it is in Korean; the particle is transformed because this...

If we don't use these function, we must write whole of possible particles. (and now we do it...) It is inefficient and ugly.

Sorry for my poor English ;)

Niklas/Brion, as I understand it is needed. can one of you port it please?

(In reply to comment #28 by Brion)

Assigning to myself for review.

Brion: As you wrote this in 2009, is that still the case, or would you like to reset the assignee to default?

Presumably this is no longer active, no. :) Reassigning to default.

The extension is currently at https://git.wikimedia.org/summary/mediawiki%2Fextensions%2FHanp (I come from there).

(In reply to Niklas Laxström from comment #31)

If there is interest, I can easily port this to core. Please let me know
that you need this.

Should this bug moved to core then? Seems so.

(In reply to JuneHyeon Bae (devunt) from comment #39)

There is a https://git.wikimedia.org/summary/mediawiki%2Fextensions%2FJosa
too.
And we also have some consensus in local wiki community:

Ok, updated bug.

However I think this feature should be integrated into core.

-shell, this needs reviewing for deployment etc...

This could also easily be accomplished with lua, no extension required

Can someone clarify what actually needs doing here?

Do we want both hanp and josa installing? Just Josa? One into core? Both into core?

Either way, Josa needs some major cleanup. There's a lot of code duplication, and it's all in global functions (for starters). That'd need doing as part of moving it to core too...

(In reply to Bawolff (Brian Wolff) from comment #43)

This could also easily be accomplished with lua, no extension required

(In reply to Chong-Dae Park from comment #35)

FYI: This function is implemented as lua in ko.wikipedia.

https://ko.wikipedia.org/wiki/Module:Hangul

RESOLVED FIXED? ;)

Only Josa, not Hanp. Modules are not very portable, but it's ko.wiki's call whether they're satisfied or not. Core or not doesn't matter so much, the code refactoring needed per above would be the same wouldn't it?

(In reply to Sam Reed (reedy) from comment #45)

(In reply to Bawolff (Brian Wolff) from comment #43)

This could also easily be accomplished with lua, no extension required

(In reply to Chong-Dae Park from comment #35)

FYI: This function is implemented as lua in ko.wikipedia.

https://ko.wikipedia.org/wiki/Module:Hangul

RESOLVED FIXED? ;)

ko.wiki's consensus in comment 39 was to implement Extension, not lua. And it looks like lua is not used much.

I refactored Extension:Josa's whole code bases in https://gerrit.wikimedia.org/r/#/c/187118/. Now can we put this extension to kowiki?

PS: Lua script is heavy and too complex IMO. (like {{#invoke:Hangul|blah|blah}})

devunt set Security to None.

I refactored Extension:Josa's whole code bases in https://gerrit.wikimedia.org/r/#/c/187118/. Now can we put this extension to kowiki?

See https://www.mediawiki.org/wiki/Review_queue#Checklist
Only the security review is missing. Please file a blocking ticket for that, and assign to csteipp.

devunt renamed this task from Install Josa extension parser function on Korean Wikipedia to Install Josa extension parser function on all Korean wiki farms..Feb 2 2015, 4:14 AM
devunt updated the task description. (Show Details)

EDIT: Not only for kowiki. Please deploy to all Korean wiki farms (kowiki, kowikinews, kowikibooks, kowikiquote, kowikisource, kowikiversity, kowiktionary).

Glaisher renamed this task from Install Josa extension parser function on all Korean wiki farms. to Install Josa extension parser function on all Korean language wikis.Feb 3 2015, 4:19 AM

Josa extension had finished its security review.

Josa extension had finished its security review.

Then it should be added to the calendar for deploy, it seems.

Change 203627 had a related patch set uploaded (by devunt):
Add Josa extension and deploy to Korean language wikis

https://gerrit.wikimedia.org/r/203627

Change 203642 had a related patch set uploaded (by devunt):
Add Josa extension to make-wmf-branch/default.conf

https://gerrit.wikimedia.org/r/203642

Change 203642 merged by jenkins-bot:
Add Josa extension to make-wmf-branch/default.conf

https://gerrit.wikimedia.org/r/203642

Let's get this extension deployed to the Beta Cluster and tested there before going to production; doesn't need to be for terribly long, just sanity checking.

Let's get this extension deployed to the Beta Cluster and tested there before going to production; doesn't need to be for terribly long, just sanity checking.

There's a pending patch which do deploy to the beta cluster.
https://gerrit.wikimedia.org/r/#/c/203627/

Change 203627 merged by jenkins-bot:
Add Josa extension to ko.wikipedia.beta.wmflabs.org

https://gerrit.wikimedia.org/r/203627

To summarize, plan is so to test a short time this extension on http://ko.wikipedia.beta.wmflabs.org, then to deploy it to ko.*

[ Adding User-notice as this is interesting for every Korean editor. ]

I'm going to remove User-notice because few Tech News subscribers are concerned by this change. Only 5 subscribers are on a ko. wiki.

More generally, language-specific or project-specific changes should be announced primarily on those sites unless a fair share of subscribers is concerned.

I hope this makes sense.

Notified to Korean wikis' Village Pump using MassMessage.

Change 210069 had a related patch set uploaded (by devunt):
Deploy Josa extension to production

https://gerrit.wikimedia.org/r/210069

Change 210069 merged by jenkins-bot:
Deploy Josa extension to production

https://gerrit.wikimedia.org/r/210069

Finally, 7-year-old child bug has been fixed.

Josa extension has been deployed to production successfully, and it works like a charm.