Page MenuHomePhabricator

Coded unreadable reference name in 3 media wiki messages in non-ASCII languages
Closed, InvalidPublic

Description

There are three problematic media wiki messages: "cite error references duplicate key", "cite error references missing key", "cite error references no text" that become unreadable in non-ASCII-languages. The text

Invalid <ref> tag; name "թ" defined multiple times with different content

is shown as

Invalid <ref> tag; name ".D5.A9" defined multiple times with different content

Meaning, we can see uri-encoded text in place of unicode symbols.
It can be fixed on each Wikipedia locally using a module with

return mw.uri.decode(frame.args[1])

a template ("decode") invoces it, and changing $1 in media wiki messages by

{{decode|{{replace|$1|.|%}}}}

but I ask you to fix it in PHP, possibly using one more field in reference data structure - original text.
Thank you very much in advance.

Event Timeline

IKhitron assigned this task to eranroz.
IKhitron raised the priority of this task from to Low.
IKhitron updated the task description. (Show Details)
matmarex added a subscriber: MrStradivarius.

The other error messages are usually generated immediately when parsing the <ref> tags, at which point they have access to the original attributes. This one is generated when parsing the <references> tag, at which point the original name for each ref has been encoded into a form suitable for using as an id attribute in HTML. It's of course not impossible to fix it, but it would require some careful refactoring of code I'm not familiar with and that I'm afraid to break :)

IKhitron added a subscriber: eranroz.
thiemowmde subscribed.

I can't reproduce this any more. It's possible this was fixed as part of the big refactoring of the Cite codebase we did in 2019.

I can't reproduce this any more. It's possible this was fixed as part of the big refactoring of the Cite codebase we did in 2019.

Don't remember the date, but it was indeed.