The Jawi script is a modified Arabic script used in the Malay language (ms_Arab). It is an abjad-alphabet hybrid, whereby it uses the base abjad system from Arabic, but applies certain alphabetic rules such as using certain vowel letters. It also modifies or repurposes certain letters for different usages, or add new variants to existing letters.
One of the main features of Jawi is its usage of the letter Hamza, which serves as a form of a utility letter rather than simply as a glottal letter as in Arabic. The most notable change is the addition of a new variant of the letter, known in Malay as "hamzah tiga suku/همزة تيݢ سوکو," which can be literally translated as "three-quarter hamza." For the purpose of this task, I will refer to the letter simply as "Jawi Hamza."[1]
The Malay name refers to the letter's visual feature, whereby it takes the form of a standard stand-alone hamza, but alters its vertical position whereby its baseline sits between three-quarter to half the height of alef, depending on scripture style.[2] This letter appears rather common in the Malay dictionary, as it serves multiple purposes, including as a glottal letter (similar but in a different manner as to Arabic) and vowel letter separator. This character has been in use in Malay writing for centuries, and is still used to this day.
As common as this character is, it has not been included in the Unicode Standard. In September 2021, the Kazakh high hamza (U+0674) annotation has been amended to include "Jawi" aside "Kazakh." However, the issue is that the Kazakh high hamza and the Jawi hamza do not share the same visual appearance nor attributes.
First of all, as visible in Kazakh writing,[3] the Unicode standard,[4] and how the character is depicted in most fonts (see Times New Roman, Noto Sans Arabic, Segoe UI, SF Arabic, and IBM Plex Sans Arabic, among others),[5] the Kazakh high hamza is of a smaller size in comparison to the regular hamza, and more similar to the size of a hamza above and hamza below (U+0654 & U+0655).
The glyph is also commonly depicted as being higher than the vertical height of the Jawi hamza, sometimes even as high as the hamza from hamza above.[5]
In the Unicode Standard, it is said that the character "forms digraphs," which is not what the Jawi hamza does. The Jawi hamza is not entirely dependent on any other letter, and could appear in front, between, or after any letter, without the need for any other letter before or after itself. It also does not merge with any other letter as how some fonts depict (although some old manuscripts may depict it as if it merges with some letters, it only seems so due to calligraphic style, whereby some letters are stacked on top of each other to save space or look visually cleaner).
It must be noted that in Jawi, the Jawi hamza must be depicted the same size as a regular hamza, with its baseline positioned between three-quarters to half the height of an alef.[2]
The motion to propose the character to the Unicode Technical Committee (UTC) started in August 2008 during the Internationalized Domain Name Forum, when linguists found the lack of the character in the Unicode Standard while discussing its inclusion to the Jawi coded character set standard. It was proposed in 2009 that the character be proposed to the UTC, however, it is unclear whether the proposal was ever sent.
After the "Jawi" annotation was added to U+0674 in 2021, a proposal to add the Jawi hamza was submitted by Karim et al. in 2022, where they described the difference between the Jawi hamza and the Kazakh high hamza. However, the Script Ad Hoc group (SAH) said that font designers are expected to either employ a language tag or create a Jawi-specific font. (Anderson et al., 2022)
We believe that this is an inappropriate solution, if it could even be considered a solution. It is akin to "employing a language tag or creating a Jawi-specific font" to display the character "Ǎ" as the character "Ă" when writing in Romanian, simply because a breve and caron looks nearly identical, or using the character "²" to display "2" because the earlier is just the smaller version of the same shape. This solution would also mean that it would be impossible to display the Jawi hamza in a text-only word processor such as text messaging applications where they only use one font. It would be even more technically disastrous to quote Kazakh and Jawi in one sentence without changing fonts for the specific word. This defeats the very purpose that the Unicode Standard was created to begin with.
Current temporary solutions for digital applications include various techniques of altering its visual appearance. One way is by creating a custom font, using an unused character which is depicted to look like the Jawi hamza.[6] This does mean, however, that when copying the text or when the font does not load, the original character would be seen in its place. Another way is to create a font that displays the regular Arabic hamza as the Jawi hamza in certain positions or when forced by a zero-width character, or displays the Kazakh high hamza as the Jawi hamza (Airaha, 2023). In this case, the character would be seen as a regular hamza or Kazakh high hamza when viewed without its font. Another way this is handled when writing on websites is to use a span class for the regular Arabic hamza to alter its vertical position, which allows it to remain the same even if any font fails to load.[7] However, it still does not work if the text is copied to another word processor. There are various other ways this is handled, each employing different workarounds that would not be compatible with platforms that do not support any customised formatting, whether through font or other features.
Therefore, we believe that the Jawi hamza deserves its own character code in the Unicode Standard, separate and disunified from U+0674 (the Kazakh high hamza) and independent from U+0621 (the regular Arabic hamza).
Note:
- In English, the character has been referred to by multiple names. Khalid (2009) used "Jawi Letter Hamzah Three Quarter." Karim et al. (2022) used "Arabic Letter Three-Quarter High Hamza." However, we propose to use either "Arabic Letter Jawi High Hamza" or "Arabic Letter Jawi Hamza" as how the characters U+06C5 and U+06CC use "Arabic Letter Kirghiz Oe" and "Arabic Letter Farsi Yeh" respectively.
- Refer to Figure 1
- Refer to Figure 2
- Refer to Figure 3
- Refer to Figures 4.1 & 4.2
- Refer to Figure 5
- Refer to Figure 6
Figures:
Figure 1 - Snippet from Ahmad (2015), which says "the hamza letter like above, must be written at the level of the middle or three-quarters of the height of the letter alef. Its size must be big."
Figure 2 - Excerpts from Kazakh text showing the Kazakh high hamza in use. The top picture is from a Kazakh-language edition of the Best Chinese short stories of 1978, while the bottom is from a Kazakh edition article from People's Daily.
Figure 3 - The Unicode 16.0 chart depicting U+0674 as smaller than U+0621, making it impossible to be used for the purpose of the Jawi hamza. The red lines indicate the heights of U+0674 and U+0654 which are the same, whereas the blue lines indicate the height of U+0621 which should be the height of the Jawi hamza.
Figure 4.1 - Current state of Kazakh high hamza in most fonts, making it unusable for the purpose of the Jawi hamza.
Figure 4.2 - How the Jawi hamza should look like in these fonts.
Figure 5 - The implementation of the Jawi hamza on the web version of Utusan Melayu, a defunct section of the Utusan Malaysia newspaper. Notice the usage of an unused character (U+FBB6) in place of the Jawi hamza as seen in the source code.
Figure 6 - The implementation of the Jawi hamza on Wikipedia, where the regular hamza is encased in a span box which has been specified to be positioned higher.
Figure 7 - Dahaman & Ahmad (1988) explaining the usage of both the Jawi hamza and the regular hamza. This shows that both characters are not interchangeable and that there is a clear difference between them in function and purpose.
Figure 8 - Kasim (2019), a Jawi text book, showing the five different types of hamza in Jawi. From right to left, the third is the Jawi hamza and the fourth is the regular hamza (alef is written for height reference). Notice that they are placed separately.
Figure 9 - Kasim (2019), a Jawi text book, showing some of the unique usages of the Jawi hamza
Figure 10 - Kasim (2019), a Jawi text book, instructing the student to transliterate two excerpts from Latin to Jawi. The exercise requires the differentiation between a regular hamza and a Jawi hamza as it uses both.
Figure 11 - An excerpt from Tarmizi (2019), which is a news article on Utusan Malaysia, showing some of the recent usages of the letter on print media.
Figure 12 - An old family tree, estimated to be from around the 18-19th century, of the Tok Masjid family. The name of the family itself in Jawi (توء مسجد) requires the usage of the Jawi hamza, but can only be written using the regular hamza as of now.
Reference:
- Ahmad, A. (2015). Kaedah Pembelajaran Jawi : Peringkat Asas [Jawi Learning Method : Basic Level]. Kuala Lumpur (Malaysia): Perpustakaan Negara Malaysia. ISBN 9789679312607.
- Airaha, N. (2023). ڤاسوان جاوي [Jawi Fonts]. https://www.behance.net/gallery/175362647/Pasuan-Jawi
- Anderson, D. et al. (2022). Recommendations to UTC #171 April 2022 on Script Proposals. https://www.unicode.org/L2/L2022/22068-script-adhoc-rept.pdf
- Dahaman, I. & Ahmad, M. (1988). Daftar Ejaan Rumi-Jawi [Rumi-Jawi Spelling Directory]. Kuala Lumpur (Malaysia): Dewan Bahasa dan Pustaka. ISBN 9789836200273.
- Karim, A. A. K. et al. (2022). Proposal to Encode ARABIC LETTER THREE QUARTER HIGH HAMZA for Jawi. https://www.unicode.org/L2/L2022/22051-jawi-hamza.pdf
- Kasim, H. (2019). بوکو تيکس جاءيس جاوي تاهون 4 [Year 4 Jawi JAIS Text Book]. Seri Kembangan (Malaysia): Jabatan Agama Islam Selangor. ISBN 9789671650004.
- Khalid, N. H. (2009). Report for Malaysia’s Internationalized Domain Name: Jawi Language Issues. MYNIC Berhad. https://github.com/EmpAhmadK/jawi/blob/main/%D9%84%D8%A7%DA%A4%D9%88%D8%B1%D9%86%20%D8%A7%D8%A1%D9%8A%20%D8%AF%D9%8A%20%D8%B9%D9%8A%D9%86.pdf
- Tarmizi, Z. W. A. (2019). برسندر باکت [Dependent on Talent]. Utusan Malaysia.
- Unicode Consortium (2024). The Unicode Standard, Version 16.0. https://www.unicode.org/Public/16.0.0/charts/CodeCharts.pdf












