Page MenuHomePhabricator

Reply link is misplaced when a comment has U+2066 in a Arabic Wikipedia
Closed, ResolvedPublic

Description

In Arabic wikipedia, maybe in all rtl pages (see this talkpage) the [Reply] link is not at the end of the comment when this comment contains a U+2066 hidden character.

Event Timeline

Restricted Application added subscribers: alaa, Aklapper. · View Herald Transcript

Thank you for reporting this, @Dyolf77_WMF, we suspect the issue you are experiencing is caused by the way the content on this page has been written. We are going to follow up with more details about this.

For context, U+2066 is a "Left-to-Right Isolate" character, and it indicates that text from that point, until the next U+2069 "Pop Directional Isolate" character, should be rendered left-to-right and without affecting the directionality-neutral characters in surrounding text (https://www.unicode.org/reports/tr9/#Explicit_Directional_Isolates).

I copy-pasted the comment into https://www.fontspace.com/unicode/analyzer to see what's going on, and discovered that there are actually two of those U+2066 characters there, but only one U+2069.

image.png (5×1 px, 530 KB)

Effectively, this indicates that the entire text of the page from that point should be laid out left-to-right. (It looks like the intent was to isolate the "^_^" emoticon, but the invisible characters were entered incorrectly.)

However, as the text uses the Arabic script, the actual text runs are rendered right-to-left as they should be, and the left-to-right directionality only applies at "boundaries" between text.

And that's how we end up with (roughly… I eyeballed this, didn't try to follow the algorithm exactly) the following runs of text:

image.png (268×2 px, 52 KB)

  • 1. Right-to-left: Arabic text
  • 2. Left-to-right: Started by the first U+2066, never ended
  • 2.1. Left-to-right: Started by the second U+2066, ended by U+2069
  • 2.2. Right-to-left: Arabic text (with some directionality-neutral numbers and punctuation)
  • 2.3. Right-to-left: Arabic text (with some directionality-neutral punctuation) – this is a separate run from 2.2 because of the workaround for T260072

This might look weird but it's what the U+2066 characters demand and what the Unicode Bidirectional Algorithm specifies. Removing one of them will switch the direction in which run 2 is rendered, and will give the expected results.

I don't think there's a DiscussionTools bug here. If this occurs often, then maybe we should consider different ways to insert reply links, that won't let them be affected by text directionality (although to be honest, all the ideas for doing this that come to mind are somewhat crazy). But I don't think DiscussionTools should be expected to deal with this.

@matmarex and @ppelberg thank you! That is very interesting to know. I will keep an eye if this often occurs and let you know.
I hope @alaa you have the answer for your question.

(Closing this ticket for now)