Page MenuHomePhabricator

Add language codes for Arabic languages for use in labels, aliases and descriptions to Wikidata
Open, Stalled, LowPublic

Description

In order to add labels to Wikidata entities in Arabic dialects using QuickStatements, we have to let them supported by Mediawiki: North Levantine Arabic (apc), South Levantine Arabic (ajp), Gulf Arabic (afb), Hejazi Arabic (acw), Najdi Arabic (ars), Hadhrami Arabic (ayh), Sanaani Arabic (ayn), Ta'izzi-Adeni Arabic (acq), Mesopotamian Arabic (acm), Cypriot Arabic (acy), Egyptian Arabic (arz), Northwest Arabian Arabic (avl), Sudanese Arabic (apd), Bahrani Arabic (abv), Libyan Arabic (ayl), Tunisian Arabic (aeb-arab), Algerian Arabic (arq), Moroccan Arabic (ary), Hassaniya Arabic (mea), Saharan Arabic (aao), and Chadian Arabic (shu).
Please let them supported by Mediawiki and if you like to have the native names of these dialects, I can give them to you.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added subscribers: alanajjar, Aklapper. · View Herald TranscriptFeb 14 2018, 4:52 PM
Csisc added a comment.Feb 15 2018, 9:17 AM

Native names:
apc: الشامي الشمالي
ajp: الشامي الجنوبي
afb: الخليجي
acw: الحجازي
ars: النجدي
ayh: الحضرمي
ayn: الصنعاني
acq: التعزي
acm: العراقي
acy: القبرصي
arz: المصري
avl: الرقاوي, الشاوي, or البدوي
apd: السوداني
abv: البحراني
ayl: الليبي
aeb-arab: التونسي
arq: الجزايرية
ary: المغربي
mea: الحسانية
aao: الصحراوي
shu: التشادي

Csisc added a comment.Feb 15 2018, 9:20 AM

If we add labels in Arabic dialects to all Wikidata entities, we will have Wikidata and consequently all the data existing in Wikipedia and Wiktionary translated to Arabic dialects. Consequently, we will not have to create Wikipedias or Wiktionaries in all the Arabic dialects. Who needs more information about an entity in Wikidata can search Arabic, French or English Wikipedia.

Csisc added a comment.Feb 15 2018, 9:20 AM

This proposal is a part of the outcomes of the presentation about Wikidata in AICCSA 2017 conference.

With more details this information can be added to https://github.com/wikimedia/language-data/blob/master/data/langdb.yaml (you can also see the expected format there, you need language code, writing script, geographical area and autonym. I think multiple of those listed are already in there. This is required for the ULS language selector to be able to list these languages.

For Wikidata side, I don't know exactly that they need to do about languages that are not supported by MediaWiki as an interface language, but I think it is a simple code or configuration change.

For a language to be "supported" by MediaWiki in the common sense, it needs at least 13% of the MediaWiki core interface messages translated. For other use cases (which require language code to be "known") it should suffice to add the English names in the CLDR extension to LocalNamesEn.php for languages not yet supported by CLDR.

With more details this information can be added to https://github.com/wikimedia/language-data/blob/master/data/langdb.yaml (you can also see the expected format there, you need language code, writing script, geographical area and autonym. I think multiple of those listed are already in there. This is required for the ULS language selector to be able to list these languages.
For Wikidata side, I don't know exactly that they need to do about languages that are not supported by MediaWiki as an interface language, but I think it is a simple code or configuration change.
For a language to be "supported" by MediaWiki in the common sense, it needs at least 13% of the MediaWiki core interface messages translated. For other use cases (which require language code to be "known") it should suffice to add the English names in the CLDR extension to LocalNamesEn.php for languages not yet supported by CLDR.

The script here is Arabic one. This is evident as we are talking here about Arabic dialects.

I know that Tunisian, Algerian, Moroccan and Egyptian already exist. But, we are talking here about all Arabic dialects and not just four dialects.

As for adding the English names in the CLDR extension to LocalNamesEn.php for languages not yet supported by CLDR, I think that this is an excellent idea if this will let adding labels in all Arabic dialects to Wikidata entities possible.

Csisc triaged this task as Unbreak Now! priority.Feb 16 2018, 6:33 PM

We are only waiting for the support to begin our work.

Restricted Application added subscribers: Liuxinyu970226, Jay8g, TerraCodes. · View Herald TranscriptFeb 16 2018, 6:33 PM

Hello
@Csisc Depending on what symbols and dialects are suggested above?

Hoi,
I do object to calling them dialects. They have a language code and as far
as standards are concerned they are languages.
Thanks,

GerardM
Csisc added a comment.Feb 16 2018, 6:59 PM

@alanajjar I only proposed these dialects because they have Wikimedia language codes.

Hoi,
That is an invalid reason to add a language to Wikidata.. Wikidata is not a
stamp collection.
Thanks,

GerardM
Csisc added a comment.Feb 16 2018, 7:15 PM

@GerardM I know that. However, this is not what I meant.

Bassem added a subscriber: Bassem.Feb 16 2018, 7:19 PM

That's nonsanse. These are all Arabic Accents and even dialects. They are not used in literal works or scientific reaserch (whenver its published in Arabic). The only time such accents are used in any type of work is some local pop poetry. Even if they have what is called a "language code" the General thoughts in all of the Arab world is these are just Accents. That's totally not acceptable.

Csisc added a comment.Feb 16 2018, 7:19 PM

@GerardM I know that. However, this is not what I meant.

In fact, we created bilingual English-Dialectal Arabic dictionaries and we would like to implement them in Wikidata

Csisc added a comment.Feb 16 2018, 7:23 PM

@Bassem I do not agree. Arabic dialects are not the same as Modern Standard Arabic. There are many false friends between them and each one of them has its linguistic particularities and this is what is proved in many research works.

my native language is Arabic and I think this will be a huge mistake, Arabic dialects use the same letters and words in reading and writing and the difference in speaking only, so the Arabic users don't need that and this request simply make big duplicated labels without any benefits at all, I'm totally object this suggestion.

my native language is Arabic and I think this will be a huge mistake, Arabic dialects use the same letters and words in reading and writing and the difference in speaking only, so the Arabic users don't need that and this request simply make big duplicated labels without any benefits at all, I'm totally object this suggestion.

Helmoony added a comment.EditedFeb 16 2018, 8:06 PM

I think that there is a misunderstanding. What @Csisc is really requesting is adding Arabic dialects/Arabic languages (That't not the point) for monolingual text values in Wikidata.
From what I understood here, you need to request it separately for each dialect/language. See here T184783 for example for the request I made for Shawiya language/Shawiya Berber dialect . As per en-us or en-gb there is no need to repeat the information existing already in the fallback language en . But it's used only when there is a specific term for that dialect/language.

Csisc added a comment.EditedFeb 16 2018, 8:25 PM

@Bassem @Ibrahim.ID: I ask if you can understand this. This is written in Tunisian.

قيدت آش باش نشري في الكرني و مشيت للعطار ياخي ما لقيتش اللي نلوج عليه

I know that you did not find the answer although the structures are commonly used in Tunisian and although there is no code-switching in the example I gave. This proves that Arabic dialects/languages are different from Modern Standard Arabic and they are not just accents of Arabic as you have claimed.

The translation to MSA: لقد سجلت ما سأشتريه في الكنش و ذهبت إلى السوق و لكنني لم أجد ما أبحث عنه.

Csisc renamed this task from Add support to all Arabic dialects in Mediawiki to Add monolingual language codes for Arabic dialects.Feb 16 2018, 9:29 PM
Csisc added a subscriber: Mbch331.Feb 16 2018, 9:40 PM

We are only waiting for the support to begin our work.

As priority reflects reality and does not cause it (specifically, the Unbreak Now! means that something is broken and needs to be fixed immediately), do you plan to fix this problem or have Language-Team members confirmed that this task indeed is more urgent? Please do not change priority if it does not confirm with Setting Task Priorities. Resources of teams are limited when it comes to working on requests. We want to be realistic about communicating what is being worked on, to maximize the impact of changes. Practically, this often unfortunately means assigning a low priority to many tasks.

If the priority was increased because you plan to work on this task please 1.claim the task by setting yourself as assignee, and 2. submit a Gerrit patch, both are required if you want to raise again. Thank you for your help!

If you do not plan to work on this task yourself but feel that this task is urgent but being ignored by those with the actual power to put the task on their agenda, please discuss with the responsible developers, product managers and budget holders. Further contact information can be found on the corresponding team wiki page. Thanks for your understanding!

Mbch331 lowered the priority of this task from Unbreak Now! to Needs Triage.Feb 17 2018, 9:41 AM

As long as there is not approval by the LangCom for these language codes to be added as a monolingual code there is no need to create a patch. And missing monolingual codes don't break Wikidata, so lowering priority to Needs Triage so the Wikidata devs can set priority. And the request needs to follow the rules on https://www.wikidata.org/wiki/Help:Monolingual_text_languages and currently this request doesn't follow those rules.

Csisc added a comment.EditedFeb 17 2018, 11:24 AM

@Mbch331 Well, I discussed that with Wikidata community. I do not think that someone will be against this task.

Csisc reassigned this task from Amire80 to GerardM.Feb 18 2018, 10:07 AM

I ask if LangCom is for or against adding all Arabic dialects/languages to Wikidata.

Csisc renamed this task from Add monolingual language codes for Arabic dialects to Add monolingual language codes for Arabic languages.Feb 18 2018, 10:16 AM

If all Arabic languages will be added to Wikidata, I will try to add labels to all Wikidata entities in all Arabic languages very soon. Consequently, if we translate Mediawiki messages and the names of Wikidata entities to all Arabic languages, we will have Wikidata and consequently the sum of all human knowledge translated into all Arabic languages. Consequently, there will be no need to create stub Wikipedias and Wiktionaries in these Arabic languages

You want it for labels and descriptions or for properties of the type monolingual text? If you want the first, then the dialects/languages need to be added to ULS for the second a patch of the Wikidata settings are needed.

You want it for labels and descriptions or for properties of the type monolingual text? If you want the first, then the dialects/languages need to be added to ULS for the second a patch of the Wikidata settings are needed.

For adding labels, descriptions and aliases to Wikidata entities and properties.

Mbch331 renamed this task from Add monolingual language codes for Arabic languages to Add language codes for Arabic languages for use in labels, aliasses and descriptions to Wikidata.Feb 18 2018, 11:11 AM
Mbch331 removed GerardM as the assignee of this task.
Mbch331 edited projects, added I18n; removed patch-welcome, good first bug.

You want it for labels and descriptions or for properties of the type monolingual text? If you want the first, then the dialects/languages need to be added to ULS for the second a patch of the Wikidata settings are needed.

For adding labels, descriptions and aliases to Wikidata entities and properties.

Thanks for clearing that up. I've edited the task to make it more clear.

Hoi,
In principle the language committee has given permission for monolingual
texts. This is NOT given for labels, descriptions and aliases. That
requires involvement of native speakers. It requires an agreement of the
language committee.

What makes you think you can add labels and descriptions correctly?
Thanks,

GerardM

Hoi,
In principle the language committee has given permission for monolingual
texts. This is NOT given for labels, descriptions and aliases. That
requires involvement of native speakers. It requires an agreement of the
language committee.
What makes you think you can add labels and descriptions correctly?
Thanks,

GerardM

I thank you for your answer. First, I used public domain dictionaries as references. These dictionaries are written more than 50 years ago by native or proficient speakers of Arabic languages and they are currently used as a reference in Arabic Linguistics. Second, as I participated to AICCSA 2017 conference, I convinced several scientists from all the Arab world to add their datasets to Wikidata under CC-0. These datasets are precise as they are made using NLP tools and verified by a panel of native speakers.

Csisc renamed this task from Add language codes for Arabic languages for use in labels, aliasses and descriptions to Wikidata to Add language codes for Arabic languages for use in labels, aliases and descriptions to Wikidata.Feb 18 2018, 12:00 PM

Hoi,
I understand how this could work for the dictionary part of Wikidata. I
understand how it could work for labels and aliases. Why do you think this
will work for descriptions?
Thanks,

GerardM
Csisc added a comment.EditedFeb 18 2018, 12:43 PM

Hoi,
I understand how this could work for the dictionary part of Wikidata. I
understand how it could work for labels and aliases. Why do you think this
will work for descriptions?
Thanks,

GerardM

This is the second part of the work. Here, we need native speakers to collaborate. The principle is simple. We will use Descriptioner for that. It will add the same description of the entities having the same links with other Wikidata entities. Like this, we can add descriptions to all Wikidata entities when having a limited number of native speakers.

Csisc assigned this task to Baba_Tabita.Feb 19 2018, 5:56 PM
Csisc triaged this task as High priority.
Mbch331 removed Baba_Tabita as the assignee of this task.Feb 19 2018, 6:04 PM
Mbch331 raised the priority of this task from High to Needs Triage.

@Csisc Since there's no approval by the langcom yet, there can't be anybody assigned to the ticket as there is nothing to code yet. And second without langcom approval there is no need to set the priority to high. And for this task approval by the langcom as a whole is needed and not approval by 1 member.
So please don't assign this to individual langcom members and don't set the priority.
Only developers can set the priority when working on the task.

Csisc added a comment.Feb 19 2018, 6:07 PM

@Mbch331 I will not raise the priority again. However, please try to work on it

@Mbch331 I will not raise the priority again. However, please try to work on it

I don't know this part of the code base. So I can't make a patch for this. Second as I mentioned before: there is no approval by the langcom. So there is no reason to start working on a patch.
So step 1: Get langcom to approve your request.
Step 2: find someone that knows this part of the code base and can make the appropriate patch(es).

thiemowmde triaged this task as Low priority.Feb 20 2018, 2:16 PM
thiemowmde added a subscriber: thiemowmde.

As for the code, it is currently not possible and not planned to support additional languages for labels and descriptions that are not supported by MediaWiki core. Adding code for this is certainly possibly, but I can not predict how much work this is going to be.

Setting priority to "low" for the moment as long as there is no approval by the committee.

As has been said, for use in labels and descriptions, a language has to be added to the ULS. Afaik, any language to be added to the ULS has to have a certain proportion of translations done at translatewiki.net - see https://translatewiki.net/wiki/FAQ#How_to_add_a_new_language
I wanted Rangi [lag] added to the ULS (once upon a time) and am still working on it. Without professional translators who know what they're doing, i.e. have the technical vocabulary, and are also native speakers of the target language, I don't see how this can happen.

Csisc added a comment.Feb 22 2018, 4:28 PM

@Baba_Tabita That's clear. I will see what I can do.

Csisc added a comment.Feb 22 2018, 4:29 PM

@Baba_Tabita We will begin working on Tunisian, Algerian, Moroccan and Egyptian as they are already supported by ULS.

Afaik, any language to be added to the ULS has to have a certain proportion of translations done at translatewiki.net

That is not correct. ULS (or nowdays language-data where this data is) takes any valid language code.

That requirement only applies for adding languages to Names.php in MediaWiki core.

Csisc added a comment.Feb 23 2018, 9:46 AM

@Nikerabbit So, I ask if you can add these Arabic languages to ULS.

@Nikerabbit So, I ask if you can add these Arabic languages to ULS.

You haven't provided all required information, see T187344#3974892 and if you want me to make a pull request to language-data there will be additional delays as I cannot review my own patches.

Csisc added a comment.EditedFeb 23 2018, 11:15 AM

@Nikerabbit

  • Their script is Arabic script.
  • Their language codes and English names are: North Levantine Arabic (apc), South Levantine Arabic (ajp), Gulf Arabic (afb), Hejazi Arabic (acw), Najdi Arabic (ars), Hadhrami Arabic (ayh), Sanaani Arabic (ayn), Ta'izzi-Adeni Arabic (acq), Mesopotamian Arabic (acm), Cypriot Arabic (acy), Egyptian Arabic (arz), Northwest Arabian Arabic (avl), Sudanese Arabic (apd), Bahrani Arabic (abv), Libyan Arabic (ayl), Tunisian Arabic (aeb-arab), Algerian Arabic (arq), Moroccan Arabic (ary), Hassaniya Arabic (mea), Saharan Arabic (aao), and Chadian Arabic (shu).
  • Native names:

apc: الشامي الشمالي
ajp: الشامي الجنوبي
afb: الخليجي
acw: الحجازي
ars: النجدي
ayh: الحضرمي
ayn: الصنعاني
acq: التعزي
acm: العراقي
acy: القبرصي
arz: المصري
avl: الرقاوي, الشاوي, or البدوي
apd: السوداني
abv: البحراني
ayl: الليبي
aeb-arab: التونسي
arq: الجزايرية
ary: المغربي
mea: الحسانية
aao: الصحراوي
shu: التشادي

Csisc added a comment.EditedFeb 23 2018, 11:28 AM

@Nikerabbit

  • Geographical areas:

apc: Syria, Lebanon
ajp: Palestine, Jordan
afb: Saudi Arabia, Bahrain, UAE, Kuwait, Oman, and Qatar
acw: Saudi Arabia
ars: Saudi Arabia
ayh: Yemen, Saudi Arabia, Oman, United Arab Emirates, Qatar, Singapore, Somalia, Eritrea, Ethiopia, Kenya, Tanzania, Sudan, Indonesia, Malaysia
ayn: Yemen
acq: Yemen, Djibouti, and Somalia
acm: Iraq, Syria, Iran, Turkey, Cyprus, and Armenia
acy: Cyprus
arz: Egypt
avl: Egypt, Jordan, Palestine, Saudi Arabia, and Syria
apd: Sudan
abv: Bahrain, Oman, and Saudi Arabia
ayl: Libya
aeb-arab: Tunisia
arq: Algeria
ary: Morocco
mea: Mauritania, Senegal, Mali, Morocco, and Algeria
aao: Algeria, Western Sahara, and Niger
shu: Chad, Cameroon, Nigeria, Niger, and Sudan

@Nikerabbit If you need several other information, please write me soon.

I totally disagree with that. Arabic dialect are neither used in education system nor in formal daily life transactions starting from simplest birth certificate to the most complex governmental transactions. I see no sense in the employment of dialect in wikidata labels.

Csisc added a comment.EditedFeb 23 2018, 2:05 PM

@1339861mzb This is not always accurate. Moroccan Arabic is used in Education in Morocco. There are also efforts in this context in Algeria, in Tunisia, and in Lebanon. Furthermore, there are many resources that are written in Arabic dialects/languages since the early 20th Century. Scientists in Tunisia have even found school lessons, mystic poems, religious speeches, and administrative information written in Tunisian during the 18th and 19th Century. We will publish a paper about that. However, this is not the most important reason about adding labels in Arabic dialects to Wikidata. In fact, there are many false friends between Arabic dialects and Modern Standard Arabic. Adding labels in Arabic dialects to Wikidata will prevent users from misunderstanding the articles of the Arabic Wikipedia. Furthermore, there are many Arab people who do not have sufficient proficiency to read and understand Modern Standard Arabic as it should be due to the poor literacy rates in some Arab countries. Letting all human knowledge as provided by Wikidata converted into their Arabic dialect will help them to better understand the information they are searching for in Wikipedia editions.

Not true. I'm Lebanese and these so called "efforts" in Lebanon were done by some Christian militias during the Civil war cause they believed that being "Arab" = being "Muslim". These ideas were abandoned after the civil war ended in 1990 and the only people who kept talking about them were radicals. The official language of Lebanon and the Lebanese is Arabic as the constitution says. You are talking as if literal Arabic is not understood in Morocco, Algeria, and Tunisia, like we are writing in what? Classical Medieval Arabic! That's nonsense, and form of Arabic in the Maghrib are Accents and dialects, if some or even all Arabs felt they are harder to understand, its not like they'll have to "learn" it, give them some time and its will be understandable, as for the Maghrabi users, you make it sound like they'll never understand or adapt to Arabic Wikipedia, like they are learning and alien language. I'm totally against this, what you are saying is unconvincing at all.

Csisc added a comment.Feb 23 2018, 3:00 PM

@Bassem You can describe better than me the situation of Lebanese as you are one of its native speakers. However, from what I see, there are still some channels like MTV Lebanon that deals with the efforts of Said Akl and that still use Said Akl's Arabic script writing system.

Csisc added a comment.EditedFeb 23 2018, 3:08 PM

@Bassem This is an article about Said Akl experience published in the Egyptian newspaper Al Youm Al Saba' in 2017.
http://m.youm7.com/story/2017/11/28/تعرف-على-حكاية-اللغة-اللبنانية-التى-اخترعها-سعيد-عقل/3530454
If the project has already been finished since 20 years, I ask about why people still deal about it.

Csisc added a comment.Feb 23 2018, 3:17 PM

@Bassem Concerning the intelligibility of Modern Standard Arabic, I did not say that Arab people do not understand Modern Standard Arabic at all. What I said is "Due to the existence of False friends between Modern Standard Arabic and Arabic dialects and due to the existence of several morphological particularities within the Arabic dialects, some Arabs can misunderstand some words within a text in Modern Standard Arabic. Adding labels to Wikidata entities will prevent such misunderstandings".

Examples:

  • بندق means pine nut in Tunisian and hazelnut in Modern Standard Arabic
  • بطيخ means melon in Tunisian and watermelon in Modern Standard Arabic
Csisc added a comment.Feb 23 2018, 3:35 PM

@Bassem @Ibrahim.ID @1339861mzb
As you have seen, there is no political reason behind the proposal. The reason is linguistic. However, if you would like to further discuss with me about that, please keep in mind that phabricator is a website for Wikimedia technical discussions. That is why I invite you to continue our discussion in https://meta.wikimedia.org/wiki/Talk:Wikimedia_Tunisie/WikiLingua_Maghreb.

@Bassem @Ibrahim.ID @1339861mzb
As you have seen, there is no political reason behind the proposal. The reason is linguistic. However, if you would like to further discuss with me about that, please keep in mind that phabricator is a website for Wikimedia technical discussions. That is why I invite you to continue our discussion in https://meta.wikimedia.org/wiki/Talk:Wikimedia_Tunisie/WikiLingua_Maghreb.

As you said it it is for technical discussion so that this task must be stoped because it is controversial until reaching to agreement on the relevant talk page

1339861mzb added a comment.EditedFeb 23 2018, 4:14 PM

@Bassem @Ibrahim.ID @1339861mzb
As you have seen, there is no political reason behind the proposal. The reason is linguistic. However, if you would like to further discuss with me about that, please keep in mind that phabricator is a website for Wikimedia technical discussions. That is why I invite you to continue our discussion in https://meta.wikimedia.org/wiki/Talk:Wikimedia_Tunisie/WikiLingua_Maghreb.

As you said it is for technical discussion so that this task must be stoped because it is controversial until reaching to agreement on the relevant talk page

Csisc changed the task status from Open to Stalled.Feb 23 2018, 4:57 PM
Helmoony removed a subscriber: Helmoony.Feb 23 2018, 5:28 PM

Said Akl was a Radical Christian who supported the Idea of Radical Christian Militias (you can say the ISIS of the Arab Christians back then) that said Lebanese people were not Arabs, and that they spoke "Lebanese" which was heavily refused by the General Public. Those who still believed in this were the most extremists of these previous christian Militias.

@Bassem I see, I do not know that as I am not Lebanese. Well, I am a Tunisian citizen :). Thank you for the information. Our proposal is not related to Said Akl's thoughts. It is just for linguistic purposes and to avoid spending time on creating tens of Wikipedias in Arabic dialects.

Amire80 removed a subscriber: Jsahleen.