Page MenuHomePhabricator

IABot 2.0beta14 encodes cyrillic characters in URLs, making links incomprehensible for humans
Open, Needs TriagePublic

Description

While running in Ukrainian Wikipedia, the bot replaces Cyrillic letters in URLs with their Percent-encoding analogues like https://uk.wikipedia.org/w/index.php?title=Богомолець_Ольга_Вадимівна&diff=prev&oldid=25257294 or https://uk.wikipedia.org/w/index.php?title=Балка_Кровецька&type=revision&diff=25239890&oldid=25223720. All modern browsers handle URLs with cyrillic letters just as well as URLs with percent encoding. But URLs with percent encoding are much longer and less clear. They make a paragraph of source wiki-code, they are included in, absolutely unreadable, it is not clear what the link is about, and sometimes they can even break the page layout.

For example,

https://uk.wikipedia.org/wiki/Український_науково-дослідний_проектно-конструкторський_та_технологічний_інститут_вибухозахищеного_та_рудникового_електрообладнання

looks way better than

<https://uk.wikipedia.org/wiki/%D0%A3%D0%BA%D1%80%D0%B0%D1%97%D0%BD%D1%81%D1%8C%D0%BA%D0%B8%D0%B9_%D0%BD%D0%B0%D1%83%D0%BA%D0%BE%D0%B2%D0%BE-%D0%B4%D0%BE%D1%81%D0%BB%D1%96%D0%B4%D0%BD%D0%B8%D0%B9_%D0%BF%D1%80%D0%BE%D0%B5%D0%BA%D1%82%D0%BD%D0%BE-%D0%BA%D0%BE%D0%BD%D1%81%D1%82%D1%80%D1%83%D0%BA%D1%82%D0%BE%D1%80%D1%81%D1%8C%D0%BA%D0%B8%D0%B9_%D1%82%D0%B0_%D1%82%D0%B5%D1%85%D0%BD%D0%BE%D0%BB%D0%BE%D0%B3%D1%96%D1%87%D0%BD%D0%B8%D0%B9_%D1%96%D0%BD%D1%81%D1%82%D0%B8%D1%82%D1%83%D1%82_%D0%B2%D0%B8%D0%B1%D1%83%D1%85%D0%BE%D0%B7%D0%B0%D1%85%D0%B8%D1%89%D0%B5%D0%BD%D0%BE%D0%B3%D0%BE_%D1%82%D0%B0_%D1%80%D1%83%D0%B4%D0%BD%D0%B8%D0%BA%D0%BE%D0%B2%D0%BE%D0%B3%D0%BE_%D0%B5%D0%BB%D0%B5%D0%BA%D1%82%D1%80%D0%BE%D0%BE%D0%B1%D0%BB%D0%B0%D0%B4%D0%BD%D0%B0%D0%BD%D0%BD%D1%8F

Event Timeline

Tohaomg created this task.May 30 2019, 6:49 PM
Restricted Application assigned this task to Cyberpower678. · View Herald TranscriptMay 30 2019, 6:49 PM
Restricted Application added subscribers: Cyberpower678, Base. · View Herald Transcript
Cirdan renamed this task from Cyrillic in InternetArchiveBot to IABot 2.0beta14 encodes cyrillic characters in URLs, making links incomprehensible for humans.May 31 2019, 10:49 AM
Cirdan edited projects, added InternetArchiveBot (v2.0); removed InternetArchiveBot.
Cirdan added a subscriber: Cirdan.

I can certainly look into this. However this is a technical limitation at the moment as IABot needs to encode the URL to actually query it. So it saves the URL internally in it's encoded form.

Any progress?