Page MenuHomePhabricator

Regression: Using Unicode Combining diacritics duplicates/replaces/adds unwanted characters
Closed, ResolvedPublic

Description

This is similar to T53472.

Some languages or writing systems require combining diacritics.
In theory most characters with diacritics can be input with combining diacritics, for example é (e + combining acute) is equivalent to é (eacute). For some combinations in some languages only the use of combining diacritics is possible, for exemple ɔ́ (open o + combining acute).

When typing or pasting text with combining diacritics the following occurs:

  • the first character after a combining diacritics is duplicated on a new line below the cursor.
  • pressing backspace pushes the characters after the first combining diacritics to that line below, the cursor jumps to the end of the newline and the first character that was duplicated jumps behind the cursor as well.
  • pressing backspace gets stuck a the combining diacritics, erasing the duplicated character last, after the characters that jumped and the line return but not erasing the combining diacritics and its base character.

For example, typing

té

is fine.
But typing other characters becomes:

tél
l

then

télévision
l

Pressing backspace with the cursor at the end of the first line:

té
élvisio

with the cursor before the o, but the next backspace will delete the o and the cursor jumps before the i.

The expected result should be the word "télévision" and pressing backspace should delete the character before the cursor or the character cluster before the cursor.

Event Timeline

Moyogo raised the priority of this task from to Needs Triage.
Moyogo updated the task description. (Show Details)
Moyogo added a project: VisualEditor.
Moyogo changed Security from none to None.
Moyogo subscribed.

I could see the issue for diacritics characters that are entered with Shift key(both in production and betalabs):

For example, using 'US extended' keyboard,

  1. type t, after t type e
  2. then use Option+Shift+e to apply an accent mark to e,
  3. type l (letter 'el')

tél
l

Then the letter l will be duplicated and placed on the next line. There will be some issues with deleting as described.

Note: Entering so called precomposed diactrics characters does not seem to present an issue. Additional info at:
http://superuser.com/questions/339359/insert-a-character-with-multiple-accents-on-an-extended-mac-keyboard-layout

Moyogo updated the task description. (Show Details)

Change 451669 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[VisualEditor/VisualEditor@master] [WIP] ve.dm.Document: Remove incorrect handling for combining characters in #fixupInsertion

https://gerrit.wikimedia.org/r/451669

If you want to test this and can't easily insert combining characters from the keyboard, here is a U+0301 Combining Acute Accent for your copy-paste convenience:

́

Tasks T182404 and T198719 may be duplicates of this bug (this needs to be tested).

Change 451669 merged by jenkins-bot:
[VisualEditor/VisualEditor@master] ve.dm.Document: Remove incorrect handling for combining characters in #fixupInsertion

https://gerrit.wikimedia.org/r/451669

matmarex claimed this task.
matmarex removed a project: Patch-For-Review.

Change 456507 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/extensions/VisualEditor@master] Update VE core submodule to master (8097e44c1)

https://gerrit.wikimedia.org/r/456507

Change 456509 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[mediawiki/extensions/VisualEditor@master] Update VE core submodule to master (8097e44c1)

https://gerrit.wikimedia.org/r/456509

Change 456509 abandoned by Jforrester:
Update VE core submodule to master (8097e44c1)

Reason:
Bartosz pushed Ieec61133d.

https://gerrit.wikimedia.org/r/456509

Change 456507 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Update VE core submodule to master (8097e44c1)

https://gerrit.wikimedia.org/r/456507