Page MenuHomePhabricator

Cannot use U+180E (MONGOLIAN VOWEL SEPARATOR) in page title
Open, Needs TriagePublicBUG REPORT

Description

In Mongolian script, U+180E (MONGOLIAN VOWEL SEPARATOR) is used within a word, which is not a word boundary. Converting them to space is not correct. According to English Wikipedia,

A separated final form of vowels a or e (᠎ᠠ ‑a/‑e) is common, and can appear at the end of a word stem, or suffix. This form requires a final-shaped preceding letter, and a word-internal gap in between. This gap can be transliterated with a hyphen. ... The presence or lack of a separated a or e can also indicate differences in meaning between different words (compare ᠬᠠᠷ᠎ᠠ qar‑a 'black' with ᠬᠠᠷᠠ qara 'to look').

Steps to replicate the issue (include links if applicable):

  • Go to ᠬᠠᠷ᠎ᠠ in English Wiktionary
  • It redirects to ᠬᠠᠷ_ᠠ (note the space)

Also,

  • Create a page with U+180E in title
  • The character in title will be converted to _ (space)

Details

Event Timeline

U+180E MONGOLIAN VOWEL SEPARATOR is no longer classified as space character (i.e. in Zs category) in Unicode 6.3.0 (2013-09-30).

Change #1285518 had a related patch set uploaded (by 沈澄心; author: 沈澄心):

[mediawiki/core@master] Don't convert U+180E to underscore in page Title

https://gerrit.wikimedia.org/r/1285518