Page MenuHomePhabricator

Detect LTR/RTL directionality on a per-post basis when it's saved
Open, MediumPublic


This allows mixing directionalities on the same page, which should be useful for embassies and sites like Commons with no official language (at least for content).

Suggested by @Amire80 at .

Event Timeline

Mattflaschen-WMF raised the priority of this task from to Needs Triage.
Mattflaschen-WMF updated the task description. (Show Details)

Just in case that page is scrapped, I'll copy it here:

Facebook, Twitter and Google+ have some automatic logic to give text correct directionality. It's not a big concern for Wikimedia on most pages in sites that are focused on one language. This is different from Twitter, where I post in English and Hebrew, and follow people who write in different languages. Twitter even has some MIT-licensed code to do it: .

Is it important for Flow? Can be very nice for Village Pumps or embassies, where other languages are sometimes used, but not super-essential.

EBernhardson subscribed.

This is something that shouldn't be too difficult and isn't very intrusive. I can see it used in a variety of wikis like commons that might have multiple languages at once.

It does matter on, if we want to have multilingual VE feedback over there.

There is which is now being using at least on MediaWiki-Vagrant. I think this would probably have to be simpler and 100% client-side, but it might be a place to look.

The Discovery team is also now working on T121538: Convert TextCat to PHP Library for Language Identification in Cirrus Search. They're porting a library to PHP, which could make hooking up to the MediaWiki API more straightforward than if it were an ElasticSearch plugin.

In T90523#1595070, @Mattflaschen wrote:

I think this would probably have to be simpler and 100% client-side, but it might be a place to look.

I'm not sure why I said that it had to be client-side. An API request after the user has typed a bit (to setup the textarea properly for their directionality and language) should be enough. Then, if it's a PHP library, we can also easily check the final text server-side before saving.

I remember we were looking into something like this for VisualEditor's language detection, but then ran into a few problems. I am not sure if the issues were completely performance-related (running that rtl/ltr test on every keystroke might be expensive?) or, as I suspect, if it had some issues with recognizing some Unicode types, especially when it came to some asian languages.

I will leave it to @dchan to pitch in. I'm sure it's possible, but I remember there were issues, and am not sure where it stands right now.

All heuristic directionality tests are inherently fallible, but the Unicode BIDI algorithm ( ) uses the comparatively simple/fast test of guessing the directionality of the paragraph as the directionality of the first strong directional character.

We added {{bidi:}} embedding syntax to mediawiki core / jquery.i18n that applies this heuristic to embed inline strings in a bidi-safe way. See - it has both client-side and server-side code to apply the heuristic.

Algorithms to find text direction are not perfect and they make mistakes. I think sticking with language of the wiki (at least for non-multilingual wikis) are the most efficient way because they would the same false positive rate as those algorithms but the most importantly I think it's vital to have an option in topic options (alongside with "edit title", "history", etc.) for changing direction. That's only my opinion. Tell me if I'm wrong.