Page MenuHomePhabricator

having $wgFixMalayalamUnicode = true in DefaultSettings.php breaks some titles and links in ml wikis on upgrade
Closed, InvalidPublic

Description

On update people must get a notice on this parameter if their language code is "ml". Recently some sites updated automatically and they got their site collapsed. Probably this may be affected Arabic also.

Please see: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/61282


Version: 1.16.x
Severity: normal

Details

Reference
bz24670

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 11:01 PM
bzimport set Reference to bz24670.
bzimport added a subscriber: Unknown Object (MLST).

What were the problems casued by this setting?

Outside links not working because they are in Old Unicode style https://bugzilla.wikimedia.org/show_bug.cgi?id=25623).

Many institutions using mediawiki in Malayalam. Not every page get edited on same second. After upgrading this change can break internal links as well as external links. So it is important to inform mediawiki users.

Site like "http://smc.org.in" collapsed because they found that templates with old style encoding were not loading after each edit. They were not aware about this setting and they didn't know that each template also need atleast one edit to convert itself new version encoding. So they changed default language to English.

I know the update.php script is meant for DB updates, but it'd be nice for that script to call the cleanupTitles.php maintenance script if its updating from a version where unicode normalization changed.

Site like "http://smc.org.in" collapsed because they found that templates with
old style encoding were not loading after each edit. They were not aware about
this setting and they didn't know that each template also need atleast one edit
to convert itself new version encoding. So they changed default language to
English.

Such sites should run the cleanupTitles.php maintenance script. This should fix the issue with certain page titles using old unicode sequences becoming unaccessible.

Just for clarification, The only two issues being reported here are after upgrade:

*Certain page titles containing the old unicode forms become inaccessible. (which is fixable by running the maintenance script) [Fixable by running maintenance/cleanupTitles.php ]
*External links to sites using these old unicode sequences stop working because the characters in the link get converted. This can be worked around by percent-encoding the links, which is admittedly a pretty crappy work around. (bug 25623)

Does anybody know which versions were affected, and from which previous version to which version this happens? Might be just WONTFIX nowadays.

Wondering if this could be related to bug 38250 and bug 20831.

(In reply to comment #4)

Does anybody know which versions were affected, and from which previous
version
to which version this happens? Might be just WONTFIX nowadays.

Wondering if this could be related to bug 38250 and bug 20831.

Presumably when upgrading from less than 1.16 (by bug fields. Whatever version feature was introduced) to something greater than 1.16 (including most recent).

How much we still care is an open question, probably not that many pre 1.16 installs with language code set to ml left around.