Page MenuHomePhabricator

Language converter for Punjabi (between Gurmukhi and Shahmukhi); merge pa, pnb wikis
Open, HighPublic


Punjabi language is written in two scriptes ,[[pa:Eastern punjabi wikipedia]] in gurmukhi scripte (in punjab , india) &[[pnb:Western punjabi wikipedia]] in shahmukhi scripte (in punjab , pakistan) , there is a softwear to covert gurmukhi scripte to shahmukhi scripte [] & convert shahmukhi scripte to gurmukhi scripte [] .Language is same only written form differ it , will it is possible on the wikipedia to read a artical in both punjabi scriptes , like kazak wikipedia , which is in three scriptes ( Arabic , Latin and crylic) which is inter readable throw softwear .Also the kurdish wikipedia which is in two scriptes , Arabic & Latin .

Event Timeline

19.abbas.75 raised the priority of this task from to Needs Triage.
19.abbas.75 updated the task description. (Show Details)
19.abbas.75 added a subscriber: 19.abbas.75.

Transcription of 2 punjabi language scriptes will give oppertunity punjabis in india or gurmukhi punjabi scripte readers to read wiki articals in shahmukhi punjabi scripte & punjabis in pakistan or shahmukhi scripte readers to read wikipedia articals in gurmukhi scripte.

TTO renamed this task from Covertion of 2 punjabi wikis scriptes to Language converter for Punjabi (between Gurmukhi and Shahmukhi).May 2 2015, 9:36 AM
TTO set Security to None.

Automatic script conversion in Punjabi language wiki will ficilitate 90,000,000 punjabis in pakistan & 25,000,000 punjabis in india .

If that tool is free software(source code available for reuse), adding that as translation tool inside the Content Translation system will provide a way to translate content between the wikis.

@19.abbas.75 Do you know the license of that tool?

Sounds like a language converter wouldn't quite be appropriate, since the two dialects have some separate vocabulary as well as being written in different alphabets:

Language converter is often used between separate alphabets (like Latin and Cyrillic). Whether there is clear mapping between the scripts and how much different vocabulary there is defines the feasibility of language converter.

But I am not sure they are asking for LanguageConverter as used in MediaWiki or something to integrate in ContentTranslation.

19.abbas.75 raised the priority of this task from Low to High.May 4 2015, 11:42 AM

Punjabi is a language in two separate alphabets (shahmukhi alphabet and gurmukhi alphabet) .Shahmukhi is a perso-arabic alphabet and gurmukhi alphabet is drived from devnagari .This covertion is not between two dialects but between two different scripts of a same language .

@19.abbas.75 Are you suggesting that and to be served through the same site using a language converter?

Yes , if it is possible , it would be great thing for punjabis in both countries (pakistan & india) . and are two versions of one language in two writting systems (shahmukhi writting system or shahmukhi script , a modified perso-arabic alphabet & gurmukhi writting system or gurmukhi script ,drived from devnagari alphabet) .There are softwear [] & []to covert both writting systems with each other . We want that like Kazak wiki , which is in three different writting systems . Kazak Arabic Alphabet , Kazak crylic alphabet & kazak latin alphabet , one can read a artical inn these three scripts or writting systems .

@19.abbas.75: How do you explain the differences in vocabulary listed at If we believe this table, it is telling us that there are certain words that exist in the Gurmukhi-based dialect that do not exist in the Shahmukhi-based dialect, and vice versa. And if we look at the Ethnologue entry for Western Punjabi, it says "Lexical similarity: 70%–85% with Punjabi". We'd probably be wanting to see closer to 100% to consider a language converter (is this true, @Nikerabbit?)

We also see here that "Western Punjabi [pnb] is distinct from Eastern Punjabi". Since the Wikimedia Language Committee usually outsources its decisions on what is and is not a language to SIL (via ISO) I think we have to respect their decision here... unless the two communities, having been informed of the potential pitfalls, agree to merge - I can't see how a consensual request of that kind could be denied.

We want that like Kazak wiki , which is in three different writting systems . Kazak Arabic Alphabet , Kazak crylic alphabet & kazak latin alphabet , one can read a artical inn these three scripts or writting systems .

Who is "we"? Have the communities of each wiki agreed that they should merge?

First of all there is no eastern or western punjabi .Punjabi is one language but is has many dialects , they all are in shahmukhi and gurmukhi scripts . To say punjabi written in gurmukhi script '''Western Punjabi''' and punjabi written in shahmukhi script '''Western punjabi''' is wrong .Only the sikhs punjabis adoptted the gurmukhi script for writting punjabi and muslims punjabis adoptted shahmukhi script for writting punjabi (also hindu punjabis use Devnagari script for writting punjabi).This all due to religious difference . Western punjabi or lahnda (lahnda means west in punjabi) is the name for the western dialict of punjabi like Multani dialict , Riasti dialict , Jhangochi dialec ect , which is now claim as Saraiki language by some local intactuals .

Name westren punjabi or Lahnda

''Lahnda'' means "western" in Punjabi. It was coined by [[William St. Clair Tisdall]] (in the form ''Lahindā'') probably around 1890 and later adopted by a number of linguists—notably [[George Abraham Grierson]]—for a dialect group that had no general local name.<ref>{{cite journal|last=Grierson|first=George A.|year=1930|title=Lahndā and Lahndī|journal=Bulletin of the School of Oriental and African Studies|volume=5|issue=4|pages=883–887|doi=10.1017/S0041977X00090571}}</ref>{{rp|883}} The southern varieties are locally called ''Saraiki'', and northwestern varieties ''Hindko and Panjistani''.

The Punjabi Shahmukhi wiki [[hppt://]] (in which i am writting articals) is a standard punjabi wikipedia in shahmukhi scrpit like Gurmukhi wiki [[hppt://]] is a standard punjabi wikipedia in Gurmukhi script , the standard punjabi is the dialect of punjabi spoken around Lahore , kasur , Shikhupura , Sialkot and Gujranwala in Pakistan and around Amritsar , Gurdaspur and Tarntarn in Punjab India , this dialect is called '''Majhi''' .So the both punjabi wikipedias are in Standard punjabi or Majhi dialect only one is in SHAMUKHI other is in GURMUKHI scripts.

You say differences in vocabulary listed at no such difference because for ''Article''' in standar punjabi is Lekh but every one also know Mazmūn . For Family Parvār/Tabbar Khāndān/Tabbar in common language these three words are commonly used but Tabbar is mostly used.For Capital these three words Rājdhānī Dārul hakūmat/Rājghar are commen .For Astronomy Tārā-vigyān Falkiyat are used in shahmukhi wiki Tara vigyan used for Atstonomy .Like Urdu and Hindi (which have a very big vocabulary of Perco-Arabic origen) Punjabi also have a great infuence of Persian and Arabic also of Sansikrit ,with is the mother of all indo Arian languages spoken in subcontinent.So in punjabi for one thing there is word from persian or arabic and also a word from sansikrit origen .Like for '''Family''' Parvar is sansikrit origion and Khandan from persian origion and Tabbar is local punjabi word , but these three words for family are commen .For Papulation the word '''Abadi''' (wich is a persian word) is used mostly by the gurmukhi wiki local word is '''lok Ginti''' wich is less used.So the perian arabic origion words (like Mazmun , Khandan ,falkiyat, Abadi , nasl, haq (right) ,Mulk (country) Rab (God) and many many other ) are commonly use by the punjabi written in Gurmukhii Script .A punjabi person from punjab , India can comunicate !00% with a punjabi persone from punjab pakistan , because their language is same . But a punjabi person from Lahore Pakistan cannot 100% comunicate with the Western Punjabi or Lahanda (Saraiki or Riasti or other Lahanda dialect) speaker of Multan or bahawalpur ,Pakistan.

We means peoples from Gurmukhi punjabi wiki and Shahmukhi punjabi wiki . Soon they will participate here.

All these minor differences , only in written language , not in the spoken punjabi language , are due to the script beriar .

I am an active editor as well as an administrator at the Gurmukhi Punjabi (pa) Wikipedia and concur with what Abbas has put forth. The vocabulary table in question just illustrates what might be more preferable on one side of border but it doesn't really imply that these words are not mutually intelligible, which they are. It's the same for many other languages including English (British vs American: biscuit vs cookie, holiday vs vacation), Spanish (even grammar differences like ustedes vs vosotros between Castilian and Latin Spanish)
@TTO As far as the Western Punjabi issue goes, these languages have already been accorded the status of altogether different languages as explained in the Wiki article itself:
"He named this group of dialects "Lahnda" in a volume of the Language Survey of India (LSI) published in 1919.[23] He grouped as "southern Lahnda" the dialects that are now recognized as Saraiki. In the National Census of Pakistan (1981) Saraiki and Hindko (previously categorized as "Western Punjabi"), got the status of separate languages,[24] which explains the decrease in the percentage of Punjabi speakers."
The dialects now used by both the wikis is the standard Majhi which is the same on both sides of the border:
"The Majhi dialect spoken around Lahore is Punjabi's prestige dialect because it is the standard of written Punjabi. Majhi is spoken in the heart of Punjab in the region of Majha, which spans Lahore, Amritsar, Gurdaspur, Kasur, Tarn Taran, Faisalabad, Nankana Sahib, Pathankot, Okara, Pakpattan, Sahiwal, Narowal, Sheikhupura, Sialkot, Chiniot, Gujranwala and Gujrat districts"

Punjabi language has been one language for ages. Sure there are dialects with some differences but that doesnt mean we create a different wiki for all the dialects. I think both the communities should decide whether we want a single wiki or two different wikis. I personally want to see both the wikis getting united. I know it is not an easy task but we have examples. We only not a script converter with a capability of changing some words as well. For example in gurmukhi ਹੈ (hai) is used whereas in shahmukhi اے (ae) is used. For some words that are more common in one part, another editor can add another word for that in brackets just following that word. Actually Tamil Wikipedia uses this. Sri Lankan Tamil and Indian Tamil have some differences. This is how they resolved it.

There are immense possibilities of this idea. This can prove to be a real landmark. It well help in uniting all the Punjabis around the globe. This will also help to improve position of Punjabi is Pakistan as well as India.

Nemo_bis renamed this task from Language converter for Punjabi (between Gurmukhi and Shahmukhi) to Language converter for Punjabi (between Gurmukhi and Shahmukhi); merge pa, pnb wikis.May 8 2015, 8:01 AM
Nemo_bis added subscribers: MF-Warburg, Nemo_bis.

The discussion here went way beyond the scope of a machine translation system, so I moved back to the language converter.

To the supporters: nobody is working in the language converter area; if you want this to move further, your only hope is to write a language converter patch yourself. When you have the technical solution ready, it will also be easier to ask the communities whether they agree with the merge.

Amire80 lowered the priority of this task from High to Low.Sep 16 2015, 7:06 PM
Harijatt123 raised the priority of this task from Low to High.Jan 17 2019, 1:21 PM

Any updates on this issue?

I can confirm that "Eastern Punjabi" and "Western Punjabi" are not different languages. The "linguistic differences" that exist refer to either differences in colloquialisms between Sikhs/Hindus vs. Muslims (not really language related) or dialect variation across the region (not actually related to the way the population is distributed in present day - a significant proportion of Pakistanis were born on the Indian side of the border and vice versa, and not enough time has passed for new dialects let alone languages to have developed). The difference is as trivial as differences between American English and British English, the complicating factor is just that it involves Unicode glyphs.

A majority of Pakistanis are not even aware that "Western Punjabi" exists as a means to get around the technicalities of one language with different scripts. Part of the issue with splitting things up this way is that most Punjabi speakers live in Pakistan and do not use Gurmukhi - so sanctioning Gurmukhi as "default" Punjabi makes it seems like a side is being picked as having a more legitimate relationship with the language. Also, it is a waste of editors' time to maintain two wikis in the same language.

The thing that should make all of this easier in theory is that Gurmukhi is a fully phonetic writing system. Everything is written exactly as it sounds, so there is a one-to-one exact way to write the Shahmukhi variant of anything already written in Gurmukhi. Shahmukhi is supposed to be but confusion can arise when non-native speakers use the Arabic script and are not aware that Punjabi has unique sounds like aspirated consonants that are supposed to be specified.

If it is a matter of someone volunteering their time to implement this rather than a technical limitation, I would be interested to look into it. I have not contributed to the Wikimedia project before (outside of being an editor/user) but I have a bit of coding experience