Page MenuHomePhabricator

Combined characters appear visually separated on macOS
Open, NormalPublic

Description

Forked from T187936#3993677.

@ssastry To see the difference you may need to look closely. I've increased the font size so it's easier to see. This is with Chrome. Note the difference in spacing where the diacritics are:

Here's Desktop version: characters around diacritics close together)

Parsoid version:

Happens on macOS High Sierra (10.13.3) in Chrome and Safari browsers.
Linux OS and Firefox browser don't seem affected.


=> Upstream: https://bugs.webkit.org/show_bug.cgi?id=6148

Event Timeline

bearND created this task.Feb 23 2018, 6:36 PM
ssastry triaged this task as Normal priority.Feb 26 2018, 4:17 PM
ssastry moved this task from Backlog to Read Views on the Parsoid board.

Update: the /page/summary endpoint has changed in 1.3.2 to more comprehensively flatten anchor and span elements, when possible into text nodes. I think this issue should be significantly reduced now. We might consider applying the same strategy for the reading content versions.
I'm not closing this because the issue would still exists for direct users of Parsoid, though.

cscott added a subscriber: cscott.EditedMar 6 2018, 4:46 PM

The problem seems to arise when (at least) one of the characters is written as an HTML entity, eg:

&#nnnn;o

Parsoid will wrap a <span> tag around the entity in order to support roundtripping, and that span tag will "separate" the combining character from the character it should apply to, in (broken versions of) webkit.

Two possible workarounds:

  • drop the span if the wrapped entity is a combining character. That will make a dirty diff on roundtripping because the entity will be expanded in the wikitext, but selser should cover most of these cases in practice.
  • expand the span to cover more than one character, roughly the same way we "expand the range" of templates.
  • do nothing and wait for webkit to fix their bug: https://bugs.webkit.org/show_bug.cgi?id=6148

My recommendation is the first, which is probably one conditional in Parsoid plus importing a (hefty) npm library containing the unicode character class database to identify combining characters.

Krinkle added a subscriber: Krinkle.
Krinkle updated the task description. (Show Details)
Krinkle moved this task from Backlog to Reported Upstream on the Upstream board.
Krinkle moved this task from Backlog to Summary on the Page Content Service board.Jun 6 2018, 2:11 AM
Reedy edited projects, added Parsoid-Read-Views; removed Parsoid.Sep 17 2018, 7:25 PM
phuedx removed a subscriber: phuedx.Sep 20 2018, 3:38 PM