Page MenuHomePhabricator

Need more appropriate link trail for Farsi language
Closed, ResolvedPublic

Description

In English environment of MediaWiki (like in English Wikipedia) one can simply create a link to an article titled "example" with the link text being "examples", using this code:

[[example]]s

In Farsi environment though, this doesn't work correctly. The following code:

[[فلان]]ی

will create a link to an article titled "فلان", but the "ی" is not shown as part of the link text; instead it is shown separately next to the link text (the link text will remain فلان indeed).

I guess the parser uses a regular expression (haven't checked the code though) to see if the character comming next to the closing brackets (]]) is from a-z A-Z 0-9 or not. If not, it will treat them as a space, and would not use them in the link text. I'm not sure about this assumption though.

Your help is appreciated.


Version: unspecified
Severity: enhancement

Details

Reference
bz10130

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:49 PM
bzimport set Reference to bz10130.
bzimport added a subscriber: Unknown Object (MLST).

ayg wrote:

Your guess is basically correct, except that this *is* localized. The regex for English is:

/^([a-z]+)(.*)$/sD

which is the 'linktrail' message in MessagesEn.php. Try providing a correct variant of that (amending the a-z character class) for Farsi. You probably want to use the u modifier, i.e.,

/^([a-zlots of farsi characters]+)(.*)$/sDu

What is wrong with:

/^(\S+)(.*)$/sDu

I think this method should work with all non-space characters.

ayg wrote:

\s matches only ASCII space characters, I believe, so \S will match things like non-breaking space, zero-width space, etc. Some languages also don't use spaces, so if you embed a link in a block of Chinese text (which may happen even on fa-wiki or wherever on user talk pages or something) it will link the whole block. Also, languages with a lot of agglutination may frequently want to link only part of a word. Then you get into weirdness with control characters and such.

A more general solution would definitely be nice, but for now best to just do something Farsi-specific.

  • This bug has been marked as a duplicate of bug 11813 ***