Page MenuHomePhabricator

Add language codes for Arabic languages for use in labels, aliases and descriptions to Wikidata
Open, Stalled, LowPublic

Description

In order to add labels to Wikidata entities in Arabic dialects using QuickStatements, we have to let them supported by Mediawiki:

  1. North Levantine Arabic (apc)
  2. South Levantine Arabic (ajp)
  3. Gulf Arabic (afb)
  4. Hejazi Arabic (acw)
  5. Najdi Arabic (ars)
  6. Hadhrami Arabic (ayh)
  7. Sanaani Arabic (ayn)
  8. Ta'izzi-Adeni Arabic (acq)
  9. Mesopotamian Arabic (acm)
  10. Cypriot Arabic (acy)
  11. Egyptian Arabic (arz)
  12. Northwest Arabian Arabic (avl)
  13. Sudanese Arabic (apd)
  14. Bahrani Arabic (abv)
  15. Libyan Arabic (ayl)
  16. Tunisian Arabic (aeb-arab)
  17. Algerian Arabic (arq)
  18. Moroccan Arabic (ary)
  19. Hassaniya Arabic (mea)
  20. Saharan Arabic (aao)
  21. Chadian Arabic (shu)

Please let them supported by Mediawiki.


Review in June 2021:

  • Name in language: see T187344#3974830
  • Geographical area where used, see T187344#3995915
  • In the meantime, some may already be supported by MediaWiki and/or parts of Wikidata. This includes all languages with a Wikipedia (e.g. Moroccan Arabic Wikipedia) or with the interface messages translated.
  • If one or the other language code is needed (for labels/descriptions/aliases) or just monolingual string or lexemes, it might be worth creating a separate request. See Language codes for steps.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I think that there is a misunderstanding. What @Csisc is really requesting is adding Arabic dialects/Arabic languages (That't not the point) for monolingual text values in Wikidata.
From what I understood here, you need to request it separately for each dialect/language. See here T184783 for example for the request I made for Shawiya language/Shawiya Berber dialect . As per en-us or en-gb there is no need to repeat the information existing already in the fallback language en . But it's used only when there is a specific term for that dialect/language.

@Bassem @Ibrahim.ID: I ask if you can understand this. This is written in Tunisian.

قيدت آش باش نشري في الكرني و مشيت للعطار ياخي ما لقيتش اللي نلوج عليه

I know that you did not find the answer although the structures are commonly used in Tunisian and although there is no code-switching in the example I gave. This proves that Arabic dialects/languages are different from Modern Standard Arabic and they are not just accents of Arabic as you have claimed.

The translation to MSA: لقد سجلت ما سأشتريه في الكنش و ذهبت إلى السوق و لكنني لم أجد ما أبحث عنه.

Csisc renamed this task from Add support to all Arabic dialects in Mediawiki to Add monolingual language codes for Arabic dialects.Feb 16 2018, 9:29 PM

We are only waiting for the support to begin our work.

As priority reflects reality and does not cause it (specifically, the Unbreak Now! means that something is broken and needs to be fixed immediately), do you plan to fix this problem or have Language-Team members confirmed that this task indeed is more urgent? Please do not change priority if it does not confirm with Setting Task Priorities. Resources of teams are limited when it comes to working on requests. We want to be realistic about communicating what is being worked on, to maximize the impact of changes. Practically, this often unfortunately means assigning a low priority to many tasks.

If the priority was increased because you plan to work on this task please 1.claim the task by setting yourself as assignee, and 2. submit a Gerrit patch, both are required if you want to raise again. Thank you for your help!

If you do not plan to work on this task yourself but feel that this task is urgent but being ignored by those with the actual power to put the task on their agenda, please discuss with the responsible developers, product managers and budget holders. Further contact information can be found on the corresponding team wiki page. Thanks for your understanding!

Mbch331 lowered the priority of this task from Unbreak Now! to Needs Triage.Feb 17 2018, 9:41 AM

As long as there is not approval by the LangCom for these language codes to be added as a monolingual code there is no need to create a patch. And missing monolingual codes don't break Wikidata, so lowering priority to Needs Triage so the Wikidata devs can set priority. And the request needs to follow the rules on https://www.wikidata.org/wiki/Help:Monolingual_text_languages and currently this request doesn't follow those rules.

@Mbch331 Well, I discussed that with Wikidata community. I do not think that someone will be against this task.

I ask if LangCom is for or against adding all Arabic dialects/languages to Wikidata.

Csisc renamed this task from Add monolingual language codes for Arabic dialects to Add monolingual language codes for Arabic languages.Feb 18 2018, 10:16 AM

If all Arabic languages will be added to Wikidata, I will try to add labels to all Wikidata entities in all Arabic languages very soon. Consequently, if we translate Mediawiki messages and the names of Wikidata entities to all Arabic languages, we will have Wikidata and consequently the sum of all human knowledge translated into all Arabic languages. Consequently, there will be no need to create stub Wikipedias and Wiktionaries in these Arabic languages

You want it for labels and descriptions or for properties of the type monolingual text? If you want the first, then the dialects/languages need to be added to ULS for the second a patch of the Wikidata settings are needed.

You want it for labels and descriptions or for properties of the type monolingual text? If you want the first, then the dialects/languages need to be added to ULS for the second a patch of the Wikidata settings are needed.

For adding labels, descriptions and aliases to Wikidata entities and properties.

Mbch331 renamed this task from Add monolingual language codes for Arabic languages to Add language codes for Arabic languages for use in labels, aliasses and descriptions to Wikidata.Feb 18 2018, 11:11 AM
Mbch331 removed GerardM as the assignee of this task.
Mbch331 edited projects, added I18n; removed patch-welcome, good first task.

You want it for labels and descriptions or for properties of the type monolingual text? If you want the first, then the dialects/languages need to be added to ULS for the second a patch of the Wikidata settings are needed.

For adding labels, descriptions and aliases to Wikidata entities and properties.

Thanks for clearing that up. I've edited the task to make it more clear.

Hoi,
In principle the language committee has given permission for monolingual
texts. This is NOT given for labels, descriptions and aliases. That
requires involvement of native speakers. It requires an agreement of the
language committee.

What makes you think you can add labels and descriptions correctly?
Thanks,

GerardM

Hoi,
In principle the language committee has given permission for monolingual
texts. This is NOT given for labels, descriptions and aliases. That
requires involvement of native speakers. It requires an agreement of the
language committee.

What makes you think you can add labels and descriptions correctly?
Thanks,

GerardM

I thank you for your answer. First, I used public domain dictionaries as references. These dictionaries are written more than 50 years ago by native or proficient speakers of Arabic languages and they are currently used as a reference in Arabic Linguistics. Second, as I participated to AICCSA 2017 conference, I convinced several scientists from all the Arab world to add their datasets to Wikidata under CC-0. These datasets are precise as they are made using NLP tools and verified by a panel of native speakers.

Csisc renamed this task from Add language codes for Arabic languages for use in labels, aliasses and descriptions to Wikidata to Add language codes for Arabic languages for use in labels, aliases and descriptions to Wikidata.Feb 18 2018, 12:00 PM

Hoi,
I understand how this could work for the dictionary part of Wikidata. I
understand how it could work for labels and aliases. Why do you think this
will work for descriptions?
Thanks,

GerardM

Hoi,
I understand how this could work for the dictionary part of Wikidata. I
understand how it could work for labels and aliases. Why do you think this
will work for descriptions?
Thanks,

GerardM

This is the second part of the work. Here, we need native speakers to collaborate. The principle is simple. We will use Descriptioner for that. It will add the same description of the entities having the same links with other Wikidata entities. Like this, we can add descriptions to all Wikidata entities when having a limited number of native speakers.

Csisc triaged this task as High priority.
Mbch331 raised the priority of this task from High to Needs Triage.

@Csisc Since there's no approval by the langcom yet, there can't be anybody assigned to the ticket as there is nothing to code yet. And second without langcom approval there is no need to set the priority to high. And for this task approval by the langcom as a whole is needed and not approval by 1 member.
So please don't assign this to individual langcom members and don't set the priority.
Only developers can set the priority when working on the task.

@Mbch331 I will not raise the priority again. However, please try to work on it

@Mbch331 I will not raise the priority again. However, please try to work on it

I don't know this part of the code base. So I can't make a patch for this. Second as I mentioned before: there is no approval by the langcom. So there is no reason to start working on a patch.
So step 1: Get langcom to approve your request.
Step 2: find someone that knows this part of the code base and can make the appropriate patch(es).

thiemowmde subscribed.

As for the code, it is currently not possible and not planned to support additional languages for labels and descriptions that are not supported by MediaWiki core. Adding code for this is certainly possibly, but I can not predict how much work this is going to be.

Setting priority to "low" for the moment as long as there is no approval by the committee.

As has been said, for use in labels and descriptions, a language has to be added to the ULS. Afaik, any language to be added to the ULS has to have a certain proportion of translations done at translatewiki.net - see https://translatewiki.net/wiki/FAQ#How_to_add_a_new_language
I wanted Rangi [lag] added to the ULS (once upon a time) and am still working on it. Without professional translators who know what they're doing, i.e. have the technical vocabulary, and are also native speakers of the target language, I don't see how this can happen.

@Baba_Tabita That's clear. I will see what I can do.

@Baba_Tabita We will begin working on Tunisian, Algerian, Moroccan and Egyptian as they are already supported by ULS.

Afaik, any language to be added to the ULS has to have a certain proportion of translations done at translatewiki.net

That is not correct. ULS (or nowdays language-data where this data is) takes any valid language code.

That requirement only applies for adding languages to Names.php in MediaWiki core.

@Nikerabbit So, I ask if you can add these Arabic languages to ULS.

@Nikerabbit So, I ask if you can add these Arabic languages to ULS.

You haven't provided all required information, see T187344#3974892 and if you want me to make a pull request to language-data there will be additional delays as I cannot review my own patches.

@Nikerabbit

  • Their script is Arabic script.
  • Their language codes and English names are: North Levantine Arabic (apc), South Levantine Arabic (ajp), Gulf Arabic (afb), Hejazi Arabic (acw), Najdi Arabic (ars), Hadhrami Arabic (ayh), Sanaani Arabic (ayn), Ta'izzi-Adeni Arabic (acq), Mesopotamian Arabic (acm), Cypriot Arabic (acy), Egyptian Arabic (arz), Northwest Arabian Arabic (avl), Sudanese Arabic (apd), Bahrani Arabic (abv), Libyan Arabic (ayl), Tunisian Arabic (aeb-arab), Algerian Arabic (arq), Moroccan Arabic (ary), Hassaniya Arabic (mea), Saharan Arabic (aao), and Chadian Arabic (shu).
  • Native names:

apc: الشامي الشمالي
ajp: الشامي الجنوبي
afb: الخليجي
acw: الحجازي
ars: النجدي
ayh: الحضرمي
ayn: الصنعاني
acq: التعزي
acm: العراقي
acy: القبرصي
arz: المصري
avl: الرقاوي, الشاوي, or البدوي
apd: السوداني
abv: البحراني
ayl: الليبي
aeb-arab: التونسي
arq: الجزايرية
ary: المغربي
mea: الحسانية
aao: الصحراوي
shu: التشادي

@Nikerabbit

  • Geographical areas:

apc: Syria, Lebanon
ajp: Palestine, Jordan
afb: Saudi Arabia, Bahrain, UAE, Kuwait, Oman, and Qatar
acw: Saudi Arabia
ars: Saudi Arabia
ayh: Yemen, Saudi Arabia, Oman, United Arab Emirates, Qatar, Singapore, Somalia, Eritrea, Ethiopia, Kenya, Tanzania, Sudan, Indonesia, Malaysia
ayn: Yemen
acq: Yemen, Djibouti, and Somalia
acm: Iraq, Syria, Iran, Turkey, Cyprus, and Armenia
acy: Cyprus
arz: Egypt
avl: Egypt, Jordan, Palestine, Saudi Arabia, and Syria
apd: Sudan
abv: Bahrain, Oman, and Saudi Arabia
ayl: Libya
aeb-arab: Tunisia
arq: Algeria
ary: Morocco
mea: Mauritania, Senegal, Mali, Morocco, and Algeria
aao: Algeria, Western Sahara, and Niger
shu: Chad, Cameroon, Nigeria, Niger, and Sudan

@Nikerabbit If you need several other information, please write me soon.

I totally disagree with that. Arabic dialect are neither used in education system nor in formal daily life transactions starting from simplest birth certificate to the most complex governmental transactions. I see no sense in the employment of dialect in wikidata labels.

@1339861mzb This is not always accurate. Moroccan Arabic is used in Education in Morocco. There are also efforts in this context in Algeria, in Tunisia, and in Lebanon. Furthermore, there are many resources that are written in Arabic dialects/languages since the early 20th Century. Scientists in Tunisia have even found school lessons, mystic poems, religious speeches, and administrative information written in Tunisian during the 18th and 19th Century. We will publish a paper about that. However, this is not the most important reason about adding labels in Arabic dialects to Wikidata. In fact, there are many false friends between Arabic dialects and Modern Standard Arabic. Adding labels in Arabic dialects to Wikidata will prevent users from misunderstanding the articles of the Arabic Wikipedia. Furthermore, there are many Arab people who do not have sufficient proficiency to read and understand Modern Standard Arabic as it should be due to the poor literacy rates in some Arab countries. Letting all human knowledge as provided by Wikidata converted into their Arabic dialect will help them to better understand the information they are searching for in Wikipedia editions.

Not true. I'm Lebanese and these so called "efforts" in Lebanon were done by some Christian militias during the Civil war cause they believed that being "Arab" = being "Muslim". These ideas were abandoned after the civil war ended in 1990 and the only people who kept talking about them were radicals. The official language of Lebanon and the Lebanese is Arabic as the constitution says. You are talking as if literal Arabic is not understood in Morocco, Algeria, and Tunisia, like we are writing in what? Classical Medieval Arabic! That's nonsense, and form of Arabic in the Maghrib are Accents and dialects, if some or even all Arabs felt they are harder to understand, its not like they'll have to "learn" it, give them some time and its will be understandable, as for the Maghrabi users, you make it sound like they'll never understand or adapt to Arabic Wikipedia, like they are learning and alien language. I'm totally against this, what you are saying is unconvincing at all.

@Bassem You can describe better than me the situation of Lebanese as you are one of its native speakers. However, from what I see, there are still some channels like MTV Lebanon that deals with the efforts of Said Akl and that still use Said Akl's Arabic script writing system.

@Bassem This is an article about Said Akl experience published in the Egyptian newspaper Al Youm Al Saba' in 2017.
http://m.youm7.com/story/2017/11/28/تعرف-على-حكاية-اللغة-اللبنانية-التى-اخترعها-سعيد-عقل/3530454
If the project has already been finished since 20 years, I ask about why people still deal about it.

@Bassem Concerning the intelligibility of Modern Standard Arabic, I did not say that Arab people do not understand Modern Standard Arabic at all. What I said is "Due to the existence of False friends between Modern Standard Arabic and Arabic dialects and due to the existence of several morphological particularities within the Arabic dialects, some Arabs can misunderstand some words within a text in Modern Standard Arabic. Adding labels to Wikidata entities will prevent such misunderstandings".

Examples:

  • بندق means pine nut in Tunisian and hazelnut in Modern Standard Arabic
  • بطيخ means melon in Tunisian and watermelon in Modern Standard Arabic

@Bassem @Ibrahim.ID @1339861mzb
As you have seen, there is no political reason behind the proposal. The reason is linguistic. However, if you would like to further discuss with me about that, please keep in mind that phabricator is a website for Wikimedia technical discussions. That is why I invite you to continue our discussion in https://meta.wikimedia.org/wiki/Talk:Wikimedia_Tunisie/WikiLingua_Maghreb.

@Bassem @Ibrahim.ID @1339861mzb
As you have seen, there is no political reason behind the proposal. The reason is linguistic. However, if you would like to further discuss with me about that, please keep in mind that phabricator is a website for Wikimedia technical discussions. That is why I invite you to continue our discussion in https://meta.wikimedia.org/wiki/Talk:Wikimedia_Tunisie/WikiLingua_Maghreb.

As you said it it is for technical discussion so that this task must be stoped because it is controversial until reaching to agreement on the relevant talk page

@Bassem @Ibrahim.ID @1339861mzb
As you have seen, there is no political reason behind the proposal. The reason is linguistic. However, if you would like to further discuss with me about that, please keep in mind that phabricator is a website for Wikimedia technical discussions. That is why I invite you to continue our discussion in https://meta.wikimedia.org/wiki/Talk:Wikimedia_Tunisie/WikiLingua_Maghreb.

As you said it is for technical discussion so that this task must be stoped because it is controversial until reaching to agreement on the relevant talk page

Said Akl was a Radical Christian who supported the Idea of Radical Christian Militias (you can say the ISIS of the Arab Christians back then) that said Lebanese people were not Arabs, and that they spoke "Lebanese" which was heavily refused by the General Public. Those who still believed in this were the most extremists of these previous christian Militias.

@Bassem I see, I do not know that as I am not Lebanese. Well, I am a Tunisian citizen :). Thank you for the information. Our proposal is not related to Said Akl's thoughts. It is just for linguistic purposes and to avoid spending time on creating tens of Wikipedias in Arabic dialects.

@Csisc: You need to add that information in T187344#3974892, not here.

@Csisc: What was the outcome of that discussion? Asking as task should not remain stalled forever....

@Aklapper I thank you for your answer. As promised, I discussed with all users about the issue and I presented the outcomes of that discussion in Wikimania 2019 conference. You can find my presentation at https://commons.wikimedia.org/wiki/File:Languages_lightning_talks.webm and https://commons.wikimedia.org/wiki/File:Wikimania_2019_-_Arabic_Linguistics.pdf. The topic seems to be controversial. Middle Eastern users are struggling the idea despite all my attempts to convince them of the importance of this action. If all Arabic languages/dialects become recognized and supported, we will be able to enrich Wikidata with information about these languages. Please go through my Wikimania presentation and decide the best solution to the matter.

Aklapper changed the task status from Stalled to Open.Jun 6 2020, 8:44 AM

Reopening per last comment

Esc3300 changed the task status from Open to Stalled.Jun 11 2021, 10:12 AM
Esc3300 subscribed.

This request seems somewhat incomplete. If one or the other language code is needed, it might be worth creating a separate request.

Would it be possible to support other Arabic varieties as well?

For instance ajp (South Levantine Arabic) and apc (North Levantine Arabic)