Page MenuHomePhabricator

Customise linktrail for Gujarati (gu)
Closed, ResolvedPublic

Description

same as many other language wikis (Bug 2981 for example), Gujarati wiki has the same problem. can entire set of Gujarati alphabets (http://www.unicode.org/charts/PDF/U0A80.pdf) added to linktrail?


Version: unspecified
Severity: enhancement
OS: other

Details

Reference
bz48798

Event Timeline

bzimport raised the priority of this task from to Normal.Nov 22 2014, 1:29 AM
bzimport set Reference to bz48798.
Dsvyas created this task.May 24 2013, 11:22 PM

It's not exactly broken as it was never implemented

(In reply to comment #0)

same as many other language wikis (Bug 2981 for example), Gujarati wiki has
the
same problem. can entire set of Gujarati alphabets
(http://www.unicode.org/charts/PDF/U0A80.pdf) added to linktrail?

Entire set? Even digits? Surely not punctuation, except perhaps hyphen. If you attach a txt list it's easy; that PDF contains some characters I'm unsure about.

Do you want the same for linkprefix?

(In reply to comment #2)

(In reply to comment #0)

same as many other language wikis (Bug 2981 for example), Gujarati wiki has
the
same problem. can entire set of Gujarati alphabets
(http://www.unicode.org/charts/PDF/U0A80.pdf) added to linktrail?

Entire set? Even digits? Surely not punctuation, except perhaps hyphen. If
you
attach a txt list it's easy; that PDF contains some characters I'm unsure
about.
Do you want the same for linkprefix?

My bad, you are right, we don't need punctuations, digits, etc. In nutshell, at least below characters:

ક્ ખ્ ગ્ ઘ્ ચ્ છ્ જ્ ઝ્ ટ્ ઠ્ ડ્ ઢ્ ણ્ ત્ થ્ દ્ ધ્ ન્ પ્ ફ્ બ્ ભ્ મ્ ય્ ર્ લ્ વ્ સ્ શ્ ષ્ હ્ ળ્ ક્ષ્ જ્ઞ્

and additional below 2 sections:

  • [[gu:વિકિપીડિયા:ગુજરાતીમાં કેવી રીતે ટાઇપ કરવું#સ્વર]]
  • [[gu:વિકિપીડિયા:ગુજરાતીમાં કેવી રીતે ટાઇપ કરવું#વિશેષ_ચિહ્નો]]

And good suggestion, I never thought about linkprefix, Yes please, add the same set of charqacters for linkprefix as well.

So I'm adding these, please check:

ક્ ખ્ ગ્ ઘ્ ચ્ છ્ જ્ ઝ્ ટ્ ઠ્ ડ્ ઢ્ ણ્ ત્ થ્ દ્ ધ્ ન્ પ્ ફ્ બ્ ભ્ મ્ ય્ ર્ લ્ વ્ સ્ શ્ ષ્ હ્ ળ્ ક્ષ્ જ્ઞ્ અ આ ઇ ઈ ઉ ઊ એ ઐ ઓ ઔ અં અઃ અઁ ઍ ઑ ઋ ઁ ઼ ।

Also on https://translatewiki.net/w/i.php?title=MediaWiki%3ALinkprefix%2Fgu&diff=4741186&oldid=2063880 but someone should check what characters are covered by the code range \x80-\xff .

Related URL: https://gerrit.wikimedia.org/r/65449 (Gerrit Change I872a9f141f64a664bc3743fcff5f036634445ba0)

(In reply to comment #4)

So I'm adding these, please check:
ક્ ખ્ ગ્ ઘ્ ચ્ છ્ જ્ ઝ્ ટ્ ઠ્ ડ્ ઢ્ ણ્ ત્ થ્ દ્ ધ્ ન્ પ્ ફ્ બ્ ભ્ મ્ ય્ ર્ લ્
વ્ સ્ શ્ ષ્ હ્ ળ્ ક્ષ્ જ્ઞ્ અ આ ઇ ઈ ઉ ઊ એ ઐ ઓ ઔ અં અઃ અઁ ઍ ઑ ઋ ઁ ઼ ।

Thank you and also the below, as only a mixture of above and below makes meaningful characters/alphabets...

્ ા િ ી ુ ૂ ે ૈ ો ૌ ં ઃ ઁ ૅ ૉ ૃ

\x80-\xff looks like wildcards (\x) which, don't seem to work as, but I will love to be wrong here..

Dhaval, we need anything in Gujarati script as a trail right? We need not write all alphabets with virama, but we can just use the gu unicode range like this:

$linkTrail = "/^([\x{0A80}-\x{0AFF}]+)(.*)$/sDu";

with $wgLanguageCode = 'gu'; it works.

Please confirm that this is what you need.

(In reply to comment #7)

Dhaval, we need anything in Gujarati script as a trail right? We need not
write
all alphabets with virama, but we can just use the gu unicode range like
this:
$linkTrail = "/^([\x{0A80}-\x{0AFF}]+)(.*)$/sDu";

Yes Santhosh, that is true. I had originally provided the table for the entire gu unicode range, but as there might be puncuation marks in it, Nemo came up with an idea of character set to be defined. I provided characters with virama because, if there are joint characters, it should work.

with $wgLanguageCode = 'gu'; it works.
Please confirm that this is what you need.

When you say it works, does it mean it is working somewhere in test enviroment? Can I test it?

(In reply to comment #8)

Yes Santhosh, that is true. I had originally provided the table for the
entire
gu unicode range, but as there might be puncuation marks in it, Nemo came up
with an idea of character set to be defined. I provided characters with
virama
because, if there are joint characters, it should work.

Conjuncts will still work with my regex too

When you say it works, does it mean it is working somewhere in test
enviroment?

No, it was my local wiki. :)

(In reply to comment #9)

(In reply to comment #8)
Conjuncts will still work with my regex too

Perfect. Lets go ahead and deploy then.

Is there any update to this? Its been 2 months since everything was sorted and Change was successfully merged into the git repository...

(In reply to comment #11)

Is there any update to this? Its been 2 months since everything was sorted
and
Change was successfully merged into the git repository...

So it's on your wiki already, didn't it work?

(In reply to comment #12)

So it's on your wiki already, didn't it work?

Exactly, it never worked on gu.wiki. Can you please check why so?

reedy@tin:/a/common$ grep -i linktrail php-1.22wmf12/languages/messages/MessagesGu.php
$linkTrail = '/^((?:[a-z]|ક્|ખ્|ગ્|ઘ્|ચ્|છ્|જ્|ઝ્|ટ્|ઠ્|ડ્|ઢ્|ણ્|ત્|થ્|દ્|ધ્|ન્|પ્|ફ્|બ્|ભ્|મ્|ય્|ર્|લ્|વ્|સ્|શ્|ષ્|હ્|ળ્|ક્ષ્|જ્ઞ્|અ|આ|ઇ|ઈ|ઉ|ઊ|એ|ઐ|ઓ|ઔ|અં|અઃ|અઁ|ઍ|ઑ|ઋ|ઁ|઼|।|્|ા|િ|ી|ુ|ૂ|ે|ૈ|ો|ૌ|ં|ઃ|ઁ|ૅ|ૉ|ૃ)+)(.*)$/sDu';
reedy@tin:/a/common$

Weird, I thought the last patchset by santhosh had converted it to a range as per comment 7, let's make it now.

(In reply to comment #15)

Weird, I thought the last patchset by santhosh had converted it to a range as
per comment 7, let's make it now.

Thanks Nemo, please let me know once it you know it is deployed, so that I can test and confirm.

Change 77509 had a related patch set uploaded by Nemo bis:
Customise linktrail for Gujarati (gu)

https://gerrit.wikimedia.org/r/77509

Change 77509 merged by jenkins-bot:
Customise linktrail for Gujarati (gu)

https://gerrit.wikimedia.org/r/77509

This is hopefully fixed now and you should see it live on gu.wiki on August 22. Please reopen if it is not fixed after that date.

Dsvyas added a comment.Aug 7 2013, 2:19 PM

I checked today and it shows very weird result (will be waiting till 22nd August anyhow, but thought to report here now, so if needed someone can simultaneously work on it).

See test page created on gu.wiki (http://gu.wikipedia.org/wiki/Test), it seems that alphabet sets provided in Comment 4, Comment 6 and Comment 14 are working but only in a specific sequence/manner, not in any combination

Thank you Nemo, it has been working perfectly well since last week.