Detect LTR/RTL directionality on a per-post basis when it's saved
Open, NormalPublic

Description

This allows mixing directionalities on the same page, which should be useful for embassies and sites like Commons with no official language (at least for content).

Suggested by @Amire80 at http://ee-flow.wmflabs.org/w/index.php?title=Topic:Sc660dyvwhbjemww&topic_showPostId=sc9tzzoy7xxtbn28#flow-post-sc9tzzoy7xxtbn28 .

Mattflaschen-WMF updated the task description. (Show Details)
Mattflaschen-WMF raised the priority of this task from to Needs Triage.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 24 2015, 3:28 AM
Mattflaschen-WMF set Security to None.

Just in case that page is scrapped, I'll copy it here:

Facebook, Twitter and Google+ have some automatic logic to give text correct directionality. It's not a big concern for Wikimedia on most pages in sites that are focused on one language. This is different from Twitter, where I post in English and Hebrew, and follow people who write in different languages. Twitter even has some MIT-licensed code to do it: https://github.com/twitter/RTLtextarea .

Is it important for Flow? Can be very nice for Village Pumps or embassies, where other languages are sometimes used, but not super-essential.

EBernhardson triaged this task as Normal priority.Feb 24 2015, 6:49 PM
EBernhardson added a subscriber: EBernhardson.

This is something that shouldn't be too difficult and isn't very intrusive. I can see it used in a variety of wikis like commons that might have multiple languages at once.

Elitre added a subscriber: Elitre.Feb 26 2015, 9:51 PM

It does matter on mediawiki.org, if we want to have multilingual VE feedback over there.

Restricted Application added a project: Collaboration-Team-Triage. · View Herald TranscriptAug 12 2015, 3:04 PM

There is https://github.com/jprante/elasticsearch-langdetect which is now being using at least on MediaWiki-Vagrant. I think this would probably have to be simpler and 100% client-side, but it might be a place to look.

The Discovery team is also now working on T121538: Convert TextCat to PHP Library for Language Identification in Cirrus Search. They're porting a library to PHP, which could make hooking up to the MediaWiki API more straightforward than if it were an ElasticSearch plugin.

In T90523#1595070, @Mattflaschen wrote:

I think this would probably have to be simpler and 100% client-side, but it might be a place to look.

I'm not sure why I said that it had to be client-side. An API request after the user has typed a bit (to setup the textarea properly for their directionality and language) should be enough. Then, if it's a PHP library, we can also easily check the final text server-side before saving.

I remember we were looking into something like this for VisualEditor's language detection, but then ran into a few problems. I am not sure if the issues were completely performance-related (running that rtl/ltr test on every keystroke might be expensive?) or, as I suspect, if it had some issues with recognizing some Unicode types, especially when it came to some asian languages.

I will leave it to @dchan to pitch in. I'm sure it's possible, but I remember there were issues, and am not sure where it stands right now.

dchan added a comment.Feb 2 2016, 8:10 AM

All heuristic directionality tests are inherently fallible, but the Unicode BIDI algorithm ( http://unicode.org/reports/tr9/ ) uses the comparatively simple/fast test of guessing the directionality of the paragraph as the directionality of the first strong directional character.

We added {{bidi:}} embedding syntax to mediawiki core / jquery.i18n that applies this heuristic to embed inline strings in a bidi-safe way. See https://gerrit.wikimedia.org/r/221774/ - it has both client-side and server-side code to apply the heuristic.

dchan added a comment.Feb 2 2016, 8:13 AM

Correction: the client-side code was committed here: https://github.com/wikimedia/jquery.i18n/pull/76 .

Algorithms to find text direction are not perfect and they make mistakes. I think sticking with language of the wiki (at least for non-multilingual wikis) are the most efficient way because they would the same false positive rate as those algorithms but the most importantly I think it's vital to have an option in topic options (alongside with "edit title", "history", etc.) for changing direction. That's only my opinion. Tell me if I'm wrong.

Restricted Application added a project: I18n. · View Herald TranscriptMar 2 2016, 8:51 PM
Amire80 moved this task from Untriaged to RTL on the I18n board.Feb 27 2018, 7:23 AM
Restricted Application added a project: Growth-Team. · View Herald TranscriptJul 18 2018, 7:00 PM
SBisson moved this task from Inbox to Triaged but Future on the Growth-Team board.Jul 20 2018, 5:52 PM