Page MenuHomePhabricator

VisualEditor: Implement script-specific cursoring for Devanagari if native cursoring is insufficient
Closed, ResolvedPublic8 Estimated Story Points


  • For language Marathi Script Devanagari anuswara diacritic is ं
  • If one types the anuswara diacritic ं after any devanagari script alphabet it is usually place above the concerned alphabet. For example क is a Marathi/Devanagari alphabet if I type क+ ं I get कं
  • If I place cursor before the cluster कं and press delete then whole cluster gets deleted and that is right way to happen, and this behaviour needs to be retained (since if basic alphabet is not there then no need of anuswara diacritic ं) .
  • Problem: In Visual Editor if I place cursor after the cluster e.g. after कं and press backspace current behaviour is the whole cluster gets deleted. Example कं + Backspace key = results in whole cluster getting deleted.
  • Expected behaviour is: Cluster with anuswara diacritic (कं) + Backspace key = retain rest of cluster(retain क ) and only anuswara diacritic ं should get deleted.
  • Reason: During spell correction many times we need to retain rest of the cluster and only change the diacritics. For anuswara ं diacritic spell change is required frequently for various reasons like change in singular-plural tense. Every time retyping the whole cluster is cumbersome. Traditional Source editing behaviour is proper and expected and problem is coming only with VisualEditor behaviour.
  • For example please see thisकृष्ण_श्रीनिवास_अर्जुनवाडकर&diff=1198833&oldid=1198826 edit difference a user had to change this diacritic at several places.

Since users would be reluctant to use VE without correct behaviour, I would prefer this being treated as a bug not just enhancement and to have fair importance level.

Version: unspecified
Severity: normal



Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:49 AM
bzimport set Reference to bz53754.

Per Bug 51472#c4 , the grapheme cluster handling for backspace is to be on a per script basis. So, this should be treated as the bug for specifically devanagari.

Also note that I am confirming the bug for hindi.

To further clarify the original report, devanagari has various diacritics which can be applied to base unicode characters. It also has a combining character halant (viram) ् (U+094D).

Currently, pressing backspace after a grapheme cluster containing one or more base characters with one or more diacritics and/or combining character deletes the entire grapheme cluster. This is not desired behaviour. Pressing delete before a cluster deletes the entire cluster. This is desired behaviour.

Examples of diacritics: ँ (Chandrabindu) U+0901 ं (Bindu) U+0902 etc.

Examples of grapheme clusters:

One base character with one diacritic: कं ( क + ं ), कँ ( क + ँ ), कः ( क + ः )

One base character with multiple diacritics: किं ( क + ि + ं )

Multiple base characters with halant: श्र ( श + ् + र ), क्ष ( क + ् + ष ), प्र ( प + ् + र )

Multiple base characters with halant followed by diacritics: श्रिं (श + ् + र + ि + ं), क्षि ( क + ् + ष + ि ), प्रे ( प + ् + र + े )

System environment:
Win7 X64
Google Chrome 29.0.1547.62 m
Page used for testing: [[:w:hi:User:Siddhartha Ghai/sandbox]]

Expected behaviour:
Only one diacritic (the last one in the grapheme), ie one unicode character, is to be deleted. The rest of the grapheme cluster is to stay intact.

Examples used (not exhaustive):
Grapheme -> Grapheme after pressing backspace
कं -> क
कँ -> क
कः -> क
क् -> क
किं -> कि
श्र -> श्
क्ष -> क्
प्र -> प्
श्रिं -> श्रि
क्षि -> क्ष
प्रे -> प्र

Current behaviour (blank indicates entire grapheme cluster was removed) (these results should be verified on other browser/OS combinations):
कं ->
कँ ->
कः ->
क् ->
किं ->
श्र -> श् (Working correctly)
क्ष -> क् (Working correctly)
प्र -> प् (Working correctly)
श्रिं -> श् (Deletes र + ि + ं , ie three unicode characters instead of one)
क्षि -> क् (Deletes ष + ि , ie two unicode characters instead of one)
प्रे -> प् (Deletes र + े , ie two unicode characters instead of one)

Points to note:
Some IMEs may provide non-normalized input for characters such as फ़ (U+095E) in place of फ (U+092B) + ़ (U+093C), ढ़ (U+095D) in place of ढ (U+0922) + ़ (U+093C) etc. In such cases, the user may expect that pressing a backspace will only eliminate the diacritic, not the entire grapheme. So, VE may have to handle normalization in such cases.

Results seem to indicate that halant is partially correctly handled. letter + halant + letter + backspace gives letter + halant correctly. But
letter + halant + backspace, instead of giving the letter, deletes the entire grapheme.

The remaining diacritics as of unicode 3.0 come under Nonspacing mark (Mn) and Spacing combining mark (Mc) (Note: This does not include devanagari extended added in unicode 6.0 and vedic extensions added in unicode 6.1)

Retitled for clarity; we're switching back to native cursoring and backspacing, which hopefully will fix this, but keeping this open and distinct in case not.

Aklapper set Security to None.
Aklapper added a subscriber: Aklapper.
In T55754#549797 in October 2013, @Jdforrester-WMF wrote:

we're switching back to native cursoring and backspacing, which hopefully will fix this

@Mahitgar: Do you know if this is still a problem in Devanagari? (I'm afraid I'm missing input method skills to test this myself.)

Jdforrester-WMF claimed this task.

I'm provisionally deeming this fixed. We changed this behaviour significantly since this was filed. Please re-open if you find issues.