Page MenuHomePhabricator

WikiEditor localization for Arabic Wikipedia
Closed, ResolvedPublic

Description

Hi. I would like to get some information regarding the Editbar and how to customize it for Arabic Wikipedia usage. It's merely translated and we still need to work on localization, namely reorganizing the Special characters section, and replacing icons presented in Latin symbols with Arabic ones - that is B for bold, etc.


Version: unspecified
Severity: enhancement

Details

Reference
bz30611

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:57 PM
bzimport added projects: WikiEditor, I18n.
bzimport set Reference to bz30611.

For a guide to creating localized toolbar icons, see https://secure.wikimedia.org/wikipedia/usability/wiki/Text_format_icons . Once you've created some, post a link here and we'll put them in.

As for reorganizing the special characters section -- why would you want to do that? I don't believe any other language community asked us to.

Thanks..

Apart from that, I'm requesting reprioritizing this section because any ar-WP editor would evidently prefer for the Arabic part to be on top of the section, and not the Latin part. I also wish if we could add diacritic symbols (harakat) wish are more likely to be used that the five additional letters i.e. p, ch, zh, g, and ng respectively.

Adding additional characters to the special characters section is easy, just tell me which ones you want added (I'll need you to give me the literal characters). I'm not sure about reprioritization, I'd have to look at the code.

I suppose that by "p, ch, zh, g, and ng", you refer to letters which are not used in Arabic itself, but in Persian.

Currently, these are the last letters in the Arabic section of jquery.wikiEditor.toolbar.config.js: "\u067e", "\u0686", "\u0698", "\u06af", "\u06ad".

I see no reason to put them in front of the regular Arabic letters. Rather, a new section can be created for them. The precedent for it is that there are three groups for Latin:

  • "Latin", which includes the most common special characters of European languages;
  • "Latin extended", which includes the more exotic characters for languages such Vietnamese;
  • the super-exotic "IPA".

There are many more special Arabic characters than "p, ch, zh, g, and ng". There are special characters for Urdu, Pashto, Sindhi etc. All of them can go to an "Arabic Extended" section, which would be useful for people who have a regular Arabic keyboard and need to type names in Urdu, Persian etc.

@Amir
I didn't say we should put them in front; I said diacritics are used more that's why we should add them. I like your idea about the new Arabic extended section, I hope to see it working sometime soon.

@Roan
Characters with Hex. NCR as follows:
َ - َ
ُ - ُ
ِ - ِ
ً - ً
ٌ - ٌ
ٍ - ٍ
ْ - ْ
ّ - ّ

Please consider this more appropriate ordering:
ابتثجحخدذرزسشصضطظعغفقكلمنهوي ءآأؤإئىة َ ُ ِ ً ٌ ٍ ْ ّ ،؛؟ پچژڤڭگ

(In reply to comment #6)

@Roan
Characters with Hex. NCR as follows:
َ - َ
ُ - ُ
ِ - ِ
ً - ً
ٌ - ٌ
ٍ - ٍ
ْ - ْ
ّ - ّ

Please consider this more appropriate ordering:
ابتثجحخدذرزسشصضطظعغفقكلمنهوي ءآأؤإئىة َ ُ ِ ً ٌ ٍ ْ ّ ،؛؟ پچژڤڭگ

Amir, since you've inserted yourself into this discussion (thank you for reading my mind), could you write up a patch for this?

Created attachment 8987
reordered Arabic and added Arabic extended

Split the Arabic section in jquery.wikiEditor.toolbar.config.js into Arabic and Arabic extended. In Arabic i put the core 28-letter alphabet, special letters for the Arabic language, vowels, punctuation and digits. In "Arabic extended" i put most of the other letters and signs that are used by languages such as Arabic, Urdu, Balochi etc.

I added the message 'wikieditor-toolbar-characters-page-arabicextended'.

In the character arrays i added comments that group characters by similarity to a basic Arabic letter, to make maintenance easier. I hope that it's OK.

Attached:

That's great Amir. Thank you both very much. I'll see how it goes and keep you in touch.

Attachment 8987 from comment 8 was applied in r95790. You can test it at https://translatewiki.net where it will go live in a few seconds. Please let us know if this is not done as expected. I will tag the revision for backporting to 1.18 and 1.17wmf.

Created attachment 8988
Arabic screenshot, English UI

I've updated the code and taken a look at the result. What I see are a few empty character cells. If I click them, something *is* inserted in my edit windows. Is this a font issue on my side, or a more generic issue?

OS: OSX 10.7, Firefox 6.

Attached:

Schermafbeelding_2011-08-30_om_18.40.27.png (434×1 px, 129 KB)

Created attachment 8989
Arabic extended screenshot, English UI

Attached:

Schermafbeelding_2011-08-30_om_18.40.48.png (437×1 px, 138 KB)

Empty cells must be diacritics. If they aren't visible, try using the Dotted Circle character (◌) as a carrier; it worked for Hebrew niqqud..

List of Ext. Arabic: http://people.w3.org/rishida/scripts/pickers/arabic-block/

Created attachment 8994
an idea for a function with a dotted circle

Zack, thank you for making me notice the dotted circle use in Hebrew. In fact, it's not implemented correctly for Hebrew. The current code for Hebrew says [ "\u05b0\u25cc", "\u05b0" ], where \u05b0 is a vowel diacritic ("niqqud") and \u25cc is the dotted circle. The dotted circle is supposed to come before the diacritic sign, not after it.

The dotted circle is used a lot in the Hebrew section (incorrectly). It is also used (correctly) in several sections for Indian languages, such as Sinhala and Gujarati. And it will be useful for Arabic and more languages. Instead of repeating it all the time, maybe it can be factored out to a function?

I wrote this proof of concept function and i am attaching it as a patch. It's only for testing, not for committing. I'm not much of a JS guru - i didn't know what would be the best place to put it, so i just put it in the beginning of the file. It works for me, but i've got a hunch that there's a better location for it.

Its logic can also be more clever - for example, it can take an array of characters and return all the needed diacritics at once.

Attached:

Re comment 14: Amir, the Arabic Extended option has already been added. I think your patch may be based on another version than trunk?

Some quick notes of review I requested on IRC:
RoanKattouw:
+ // The core 28-letter alphabet, special letters for the Arabic language,
[5:15p] RoanKattouw: Use tabs not spaces for indentation
[5:15p] RoanKattouw: "\u0627", "\u0628", "\u062a",
[5:15p] Krinkle: I'd recommend using [diacritic, dottedCircle(diacritic)]
[5:15p] RoanKattouw: Random tab in the middle of a line
[5:15p] Krinkle: eh, the other way around of course
[5:15p] Krinkle: ie. not let it return an array
[5:16p] Krinkle: To be more flexible. Otherwise rename the function
[5:16p] RoanKattouw: siebrand: Patch looks fine otherwise

Created attachment 9000
better implementation of the dottedCircleWithDiacritic function

(See comment 14 for the general description.)

Description:

  1. Created dottedCircleWithDiacritic function in the closure.
  1. Changed arabic, arabic extended and hebrew sections to work with the function.

Other comments:

  1. If this is fine, other sections that use \u25cc should use this function, too. Currently it's sinhala and gujarati, and it's useful for more languages.
  1. Is there a nice way to program this function to accept an array of characters and return an array of sequences, so it won't have to be repeated so many times?

Attached:

(In reply to comment #4)

characters). I'm not sure about reprioritization, I'd have to look at the code.

Any good news? Or else, is it possible to display the Arabic section by default, similarly to the use of # in wikilinks?

(In reply to comment #17)

  1. Is there a nice way to program this function to accept an array of

characters and return an array of sequences, so it won't have to be repeated so
many times?

Yes, but as far as I know it is not possible to do [1, 2, bar(3, 4), 5] and have [1, 2, 3x, 4x, 5] as the output instead of [1, 2, [3x, 4x], 5]. Perhaps some kind of special marker and then post processing the table?

(In reply to comment #17)

Some characters appear as some sort of indistinguishable dashed circles (e.g. ؠ) while they should be rendered as hex code rectangular boxes (e.g. ݹ)..

Hi,

What remains to be done here?

  1. The broken characters can currently be fixed by installing a good font (for example http://www.amirifont.org/ ) and using a browser that support fallback fonts well. We are already working on making this font automatically available as a web font, but until we fix some issues, it must be installed manually.
  1. The vowels are inserted correctly according to my tests.
  1. If you want to customize the icons such as bold, italic, etc., let us know which icons do you want to use according to Roan's comment #1.

Anything else?

sumanah wrote:

marking patch reviewed.

Well, in the Arabic extended section, Windows renders characters in different fonts; I have characters appearing in Courier New and Amiri.

Another thing, is it possible to include a zero-width joiner for example, in the '''bold'''/''italic'' notation in order to join Arabic characters in bold/italic with those in normal font weight?

(In reply to comment #23)

Well, in the Arabic extended section, Windows renders characters in different
fonts; I have characters appearing in Courier New and Amiri.

Yes, currently fonts may get mixed up even if you have a good font installed. We are working on it separately.

Another thing, is it possible to include a zero-width joiner for example, in
the '''bold'''/''italic'' notation in order to join Arabic characters in
bold/italic with those in normal font weight?

It's possible to add ZWJ and ZWNJ, although i didn't understand how is related to bold/italic.

Added ZWJ and ZWNJ in https://gerrit.wikimedia.org/r/#change,3681 .

Anything else, or can i close this?

(The fonts problem will be solved separately.)