Page MenuHomePhabricator

Enable Unicode normalization for Malayalam on Wikimedia Wikis
Open, MediumPublic

Description

Please see Bug 22371.

Normalization is enabled on All Malayalam Language wikies. But now Malayalam grew outside those wikies (mainly to commons, wiki data etc) and old defacto characters are not supported by many applications including various webkit browsers (Chrome, Chromium), and many mobile applications. Please Enable normalization in All Wikimedia Wikies.

(Probably Arabic is also need this.)


Version: wmf-deployment
Severity: enhancement

Details

Reference
bz45476

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:30 AM
bzimport set Reference to bz45476.
bzimport added a subscriber: Unknown Object (MLST).

What?

And where is the consensus that this should be done?

Longer version of last comment:
For any configuration change, we require a local consensus. As this request is cross-wiki, this requires discussing the matter on Meta, probably https://meta.wikimedia.org/w/index.php?title=Wikimedia_Forum (somebody correct me if this is the wrong place), in order to confirm that this change is wanted by the community.

For more information about how to request these kinds of changes, please see https://meta.wikimedia.org/wiki/Requesting_wiki_configuration_changes . Thanks!

Here the consensus for original bug: http://ml.wikipedia.org/wiki/WP:Panchayath_(Technical)/Unicode_5.1.0

But at that time fix was limited to Malayalam Language wikis because of some performance issues (If I remember correctly). And as you can see in Mediawiki, If your wiki language is Malayalam, the normalization is enabled by default.

And also joiner based combinations are always problematic (Bug 45111).

Normalization works only when content language is ml, or ar. It does not get triggered based on user interface language. includes/WebRequest.php normalizeUnicode method calls normalize on $wgContLang.

However, normalization in non-ml/non-ar wikis is also possible. Check translatewiki.net.

$wgAllUnicodeFixes = true; is the setting required to get this normalization irrespective of the content language.

http://www.mediawiki.org/wiki/Manual:$wgAllUnicodeFixes

Enabling this means, every normalize call will do Unicode normalization fix for Malayalam and Arabic. Documentation hints performance impact, but I don't know how much.

meta request: [[meta:General_requests#Unicode_normalization_for_Malayalam_text_in_all_Wikimedia_projects]]

local notification: [[:ml:വിക്കിപീഡിയ:പഞ്ചായത്ത് (സാങ്കേതികം)#നോർമലൈസേഷൻ]]

Andre, Sam, Ori, what's going on with this issue?

tomasz set Security to None.

Per @siebrand comment, it seems this task has the support of i18n. There is still the performance aspect to determine.

@Gilles there a performance cost associated to this change?