Page MenuHomePhabricator

missing unicode normalization
Closed, DeclinedPublicFeature

Description

The "completion" feature and probably some other places of input are missing unicode normalization.

For example, while using the interface in French, when adding a Definition:

  • select the blank Languague field
  • start type "franc" and "français" is offered as a valid language
  • add a combining mark to have "franç" and no language is offered anymore

"ç" and "ç" are equivalent, the expected result is the same for either.


Version: unspecified
Severity: enhancement

Details

Reference
bz10099

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:53 PM
bzimport set Reference to bz10099.
bzimport added a subscriber: Unknown Object (MLST).

kipmaster wrote:

Now, "c" and "ç" both allow français to be displayed in the combobox. So, it has been solved apparently.

No this hasn't been fixed.

Try these three:
(ASCII) franc <U+0066 U+0072 U+0061 U+006E U+0063>
(NFC) franç <U+0066 U+0072 U+0061 U+006E U+00E7>
(NFD) franç <U+0066 U+0072 U+0061 U+006E U+0063 U+0327>

ASCII and NFC will give the expected result: français, français canadien, français de Belgique, français de France, français de Suisse, francoprovençal

NFD does not give any language in the list.

NFC and NFD should return the same result. For Unicode NFC and NFD represent the same string.
What is recent is ASCII and NFC giving the same result.

Bugzilla normalizes to NFC.
So in my example NFD and NFC are both saved and displayed as NFC.
Use the codepoints if you want to actually use NFD.
In HTML code: franc&#x0327;

fiable wrote:

For those who don't know, these acronyms are explained there:
http://www.unicode.org/reports/tr15/
The problem also appears for the ellipsis / 3 points.

kipmaster wrote:

Could you check again with the ç now? It seems to almost work, except that the function that puts the searched string in bold does not work with NFD (if I understood that correctly this time)

Cool. It seems to have been fixed.
Now "francais", "fran&ccedil;ais" and "franc&#x0327;ais" give the same results.
Merci Kip !

But yes, when using combining characters (NFD) it is not bolded.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:01 AM