Page MenuHomePhabricator

Percent-encode URLs in statements before saving if the Unicode normalisation is different
Open, Needs TriagePublicFeature

Description

MediaWiki normalises everything to NFC, which causes problems when someone tries to save an unencoded URL which contains characters which are changed by the normalisation, as reported in T206188.

Wikibase is already using Javascript to submit the edits, so that could be avoided by using Javascript's built-in normalisation function (MDN) to check whether the URL will be changed by normalisation and encode it if necessary, before making the API request to save the statement.

Simple example using the URL from T206188:

var url = "https://www.ebanglalibrary.com/bangladictionary/অশোক-চট্টোপাধ্যায়/";
if (url.normalize("NFC") !== url) {
	url = encodeURI(url);
}

turns the URL into:

https://www.ebanglalibrary.com/bangladictionary/%E0%A6%85%E0%A6%B6%E0%A7%8B%E0%A6%95-%E0%A6%9A%E0%A6%9F%E0%A7%8D%E0%A6%9F%E0%A7%8B%E0%A6%AA%E0%A6%BE%E0%A6%A7%E0%A7%8D%E0%A6%AF%E0%A6%BE%E0%A7%9F/

If T358114 is implemented as well, it would mean the URLs would still be decoded when displayed.