Page MenuHomePhabricator

Request for language codes in lexemes + monolingual text for historical Middle Indic languages: pra, psu, pgd
Open, Needs TriagePublicFeature

Description

Feature summary (what you would like to be able to do and where):

Use language codes for Prakrit (generally), Sauraseni, and Gandhari in lexemes and in monolingual text for the titles of works and names of historical entities on Wikidata items.

The specific code + script combinations I am requesting are as follows:

Prakrit:

  • pra-deva
  • pra-guru
  • pra-arab
  • pra-brah

Sauraseni:

  • psu-deva
  • psu-guru
  • psu-arab
  • psu-brah

Gandhari:

  • pgd-khar
  • pgd-deva
  • pgd-arab

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

I have been adding a number of lexemes in Prakrit and related languages for historical interest in the etymologies of modern Indic languages. As these words are often indirectly attested--for example, written after the time they originated as oral literature, or known from transcriptions of original works which are lost--they may be written in a variety of scripts. Further, there is no clear line to be drawn where Prakrit ended and the modern languages started; Punjabi and Hindustani are examples of languages which may be considered modern continuations of Prakrit. In texts written in modern languages, Prakrit words are typically transcribed in the script used for the modern descendant. The Urdu Lughat dictionary transcribes Prakrit in the Perso-Arabic script, and Punjabi dictionaries published in India transcribe it in Gurmukhi. This is why I have also been representing Prakrit lexemes in these scripts.

Benefits (why should this be implemented?):

This would help Wikidata become a more valuable resource for collecting data about the origins and connections between modern Indic languages.

There are other codes which may be used for the Prakrit varieties which preceded other Indic languages, but for now I am focusing this ticket on those that I have been adding and am more familiar with.

Pinging @Amire80 since I had asked about the possibility of getting some of these historical language codes added

Event Timeline

Any thoughts on this? @Lydia_Pintscher @jhsoby

I have been adding an increasing number of Prakrit lexeme forms in different scripts, and it would be preferable to have the proper language codes available so that thousands of language codes don't have to be changed later. It also has the potential to cause confusion with Hindi or Sanskrit lexemes if someone is not aware that the language code is not available for Prakrit. (This has happened at least once so far.)

Change 985398 had a related patch set uploaded (by Nikki; author: Nikki):

[mediawiki/extensions/cldr@master] Add various English names that have been requested

https://gerrit.wikimedia.org/r/985398

Change 985398 merged by jenkins-bot:

[mediawiki/extensions/cldr@master] Add various English names that have been requested

https://gerrit.wikimedia.org/r/985398

Change 1009741 had a related patch set uploaded (by Nikki; author: Nikki):

[mediawiki/extensions/cldr@master] Add English names for script variants that have been requested

https://gerrit.wikimedia.org/r/1009741

Change #1009741 merged by jenkins-bot:

[mediawiki/extensions/cldr@master] Add English names for script variants that have been requested

https://gerrit.wikimedia.org/r/1009741