Page MenuHomePhabricator

External link not working if character whose decomposition contains nukta is present in link
Open, MediumPublic

Description

This is related to T170779. Any external link given in Wikidata, Wikisource etc. does not work when the link contains an Indic character which can be decomposed by Mediawiki as main character plus nukta (underdot). For example, য় can be decomposed as য + ়. In the Wikidata item Q56864029, the reference for birth and death dates should have been https://www.ebanglalibrary.com/bangladictionary/অশোক-চট্টোপাধ্যায়/, but this does not work. So, it was given as https://www.ebanglalibrary.com/bangladictionary/%E0%A6%85%E0%A6%B6%E0%A7%8B%E0%A6%95-%E0%A6%9A%E0%A6%9F%E0%A7%8D%E0%A6%9F%E0%A7%8B%E0%A6%AA%E0%A6%BE%E0%A6%A7%E0%A7%8D%E0%A6%AF%E0%A6%BE%E0%A7%9F/. This is very cumbersome. But the link having this type of character will work if the link goes to another Wiki site (like Wikidata to Wikipedia, Wikisource to another Wikisource etc.)

Another related problem. If you save a sentence in Wikisource containing a character which contains nukta after decomposition, it will automatically decompose on saving. Even the character set of Wiki Editor contains decomposed letters only (য় of Wiki Editor is য + ়).

So, basically it seems that characters that contain nukta on decomposition get automatically decomposed on saving, in all the Wiki sites. That is why external links containing such characters do not work.

Event Timeline

Aklapper changed the task status from Open to Stalled.Oct 4 2018, 6:43 AM

Hi @Hrishikes, thanks for taking the time to report this!
Both https://www.ebanglalibrary.com/bangladictionary/অশোক-চট্টোপাধ্যায়/ and https://www.ebanglalibrary.com/bangladictionary/%E0%A6%85%E0%A6%B6%E0%A7%8B%E0%A6%95-%E0%A6%9A%E0%A6%9F%E0%A7%8D%E0%A6%9F%E0%A7%8B%E0%A6%AA%E0%A6%BE%E0%A6%A7%E0%A7%8D%E0%A6%AF%E0%A6%BE%E0%A7%9F/ open the very same page in my browser (Firefox 62) when clicking those links in this bug report.

So I do not know how to see the problem with the current information given.

Please add a more complete description to this report: A clear list of specific steps to reproduce the situation, as little details sometimes matter, so that nobody needs to guess how you performed each step, describing actual results and expected results after performing the steps to reproduce, information about your web browser(s).
You can edit the task description by clicking Edit Task.
Ideally, exact and clear steps to reproduce should allow any other person to follow these steps (without having to interpret those steps) and see the same results. Problems that others can reliably reproduce can get fixed faster. Thanks!

Hi @Aklapper, thanks for the response. Instead of this bug report here, please go to the links from Wikidata, e.g., from here: https://www.wikidata.org/w/index.php?title=Q56864029&diff=756586516&oldid=756540941. You will see one link on the left, the other on the right. Please check the link having Bengali script.

Ah, thanks a lot! Yes, confirming!

First non-working link goes to:
অশোক-চট্টোপাধ্যায় / %E0%A6%85%E0%A6%B6%E0%A7%8B%E0%A6%95-%E0%A6%9A%E0%A6%9F%E0%A7%8D%E0%A6%9F%E0%A7%8B%E0%A6%AA%E0%A6%BE%E0%A6%A7%E0%A7%8D%E0%A6%AF%E0%A6%BE%E0%A6%AF%E0%A6%BC

Second working link goes to:
অশোক-চট্টোপাধ্যায় / %E0%A6%85%E0%A6%B6%E0%A7%8B%E0%A6%95-%E0%A6%9A%E0%A6%9F%E0%A7%8D%E0%A6%9F%E0%A7%8B%E0%A6%AA%E0%A6%BE%E0%A6%A7%E0%A7%8D%E0%A6%AF%E0%A6%BE%E0%A7%9F

Aklapper changed the task status from Stalled to Open.Oct 4 2018, 8:11 AM

(Copying part of my comment from T144071#2588205, it's funny that it's not only the same underlying issue, but also exactly the same letter:)

Mediawiki transforms all input into NFC before further processing. Though it doesn't seem obvious, the NFC of য় (U+09DF) is <য ়> (<U+09AF U+09BC>), because it has the Comp_Ex property set, like other other Bengali characters, too.

So the fact that Bengali letters with nukta are replaced with the decomposed form on saving is actually expected, which unfortunately means that you can't enter any raw য়, so for external links that require that character, you have to replace it by the URL encoded form. A bit more readable variant would be to only replace that letter, and leave the others (which are not affected) as they are: https://www.ebanglalibrary.com/bangladictionary/অশোক-চট্টোপাধ্যা%E0%A7%9F/

This is broken as well for Punjabi Gurmukhi

Nokib_Sarkar subscribed.