Page MenuHomePhabricator

BiDi issues related to the "In other language" section should be fixed in Names.php
Closed, ResolvedPublic

Description

Author: gangleri

Description:
Hallo!

See how the standard BiDi character rendering of your browser handles
thecharacters "(" and ")" in the Language names *Norsk (nynorsk)* and *Norsk
(bokmål)* at
http://ar.wikipedia.org/wiki/%D8%A5%D8%B3%D8%A8%D8%B1%D8%A7%D9%86%D8%AA%D9%88_%28%D9%84%D8%BA%D8%A9%29
http://fa.wikipedia.org/wiki/%D8%A7%D8%B3%D9%BE%D8%B1%D8%A7%D9%86%D8%AA%D9%88
http://he.wikipedia.org/wiki/%D7%90%D7%A1%D7%A4%D7%A8%D7%A0%D7%98%D7%95

Please tray the following in *Names.php*:

'nb' => 'Norsk ‭(‬bokmål‭)‬', # Norwegian (Bokmal)
'nn' => 'Norsk ‭(‬nynorsk‭)‬' , # Norwegian (Nynorsk)
'no' => 'Norsk ‭(‬bokmål‭)‬', # Norwegian

and similar fixes for:
'za' => '(Cuengh)', # Zhuang
'zh' => '中文', # (Zhōng Wén) - Chinese
'zh-cfr' => '閩南語', # Min-nan alias (site is at minnan)
'zh-cn' => '中文(简体)', # Simplified
'zh-hk' => '中文(繁體)', # Traditional (Hong Kong)
'zh-min-nan' => 'Bân-lâm-gú', # Min-nan
'zh-sg' => '中文(简体)', # Simplified (Singapore)
'zh-tw' => '中文(繁體)', # Traditiona

References:

Unicode Character LEFT-TO-RIGHT OVERRIDE - U 202D
http://www.fileformat.info/info/unicode/char/202D/index.htm
and
Unicode Character POP DIRECTIONAL FORMATTING - U 202C
http://www.fileformat.info/info/unicode/char/202C/index.htm
and similar BiDi
bug 3922: enhancement: adjust BiDi mess in category list

Please note that if other relevant BiDi characters are included in the actual /
future Name.php as ",", ".", "!", "?", ":" which renders depending of RTL / LTR
these shoud be handeled in a similar way:

if the name has to be written as LTR text the use:
&#8237;<BiDi character>&#8236;
if the name has to be written as RTL text the use:
&#8238;<BiDi character>&#8236;

Thanks in advance for your efforts.

best regards reinhardt [[user:gangleri]]


Version: unspecified
Severity: enhancement
URL: http://yi.wiktionary.org/w/index.php?title=project:bugzilla/03953&oldid=6089

Details

Reference
bz3953

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:54 PM
bzimport set Reference to bz3953.
bzimport added a subscriber: Unknown Object (MLST).

gangleri wrote:

unified patch for bug 03953 caution: file contains characters coded as &amp;#nnnn;

Thanks to Borgx / Stanley for the help.

Please test the patch in an wikifarm both at a LTR and at a RTL wiki.

regards reinhardt [[user:gangleri]]

Attached:

gangleri wrote:

the line
'ks' => 'कश्मीरी - (كشميري)', # Kashmiri
should be changed as well to

'ks' => '&#8234;कश्मीरी &#8237;-&#8236; (كشميري)&#8236;', # Kashmiri

compare with
http://yi.wiktionary.org/w/index.php?title=project:bugzilla/03953&oldid=6536#Kashmiri_2

gangleri wrote:

removed "need-review, patch"

Problems about the rendering of the general punctuation characters in various
browsers as Konqueror have been reported.

Please wait with the implementation of this bug until an optimal solution is found.

Testcases are available at [[wiktionary:yi:project:bugzilla/03953]]. If you use
other browsers then testes until now please add a comment about the behavior.
Thanks in advance!

plugwash wrote:

isn't it considered better practice to use html for text direction control on
the web?

gangleri wrote:

(In reply to comment #5)

isn't it considered better practice to use html for text direction control on
the web?

Thanks Peter for the question. Please see
Bug 2453: [[Special:Allmessages]] should provide information about the type of
the MediaWiki message

Not all MediaWiki messages are rendering HTML, some eaven do *not* render UTF-8
characters coded in &#nnnn; or &#xnnnn; notation.

Hashar will check what is allowed to be used inside the names at Names.php.

Added some hints to the problematic cases for the Bidi algorithm in r20081.

Manual \xxxx codes seem fragile and ugly here. Might it be nicer to slap these
on programmatically?

All I know that I'm using w3m to browse, and see these funny
unprintable LEFT-TO-RIGHT EMBEDDING U+202A etc. characters on the
Norse link.

$ wwwoffle -o http://en.wikipedia.org/wiki/Wikipedia:NOT |
perl -nlwe '/(>...Norsk.*)</&&print $1' |uni2ascii -wa P

U+202A Norsk (bokmU+00E5 l)U+202C </a>

The other links on that page have no such extra wrapping characters.

OK, I suppose Norse must have special needs, but it looks weird.