Page MenuHomePhabricator

URL-encoding of external-id values in Wikidata frontend breaks (some) links
Open, LowPublic

Description

Values of external-id properties link to the external database entry from the Wikidata frontend. In some cases the external weblink is not working due to URL-encoding of special characters within the external identifier, such as %, &, =, and maybe others. An example is https://www.wikidata.org/wiki/Q325887#P3520 with the correct identifier W%D6LLEKLA01. The extra URL-encoding translates it to W%25D6LLEKLA01 which is not working.

I originally requested an update at wikidata:MediaWiki talk:Gadget-AuthorityControl.js, but I learnt that this gadget is no longer responsible for the linking.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 12 2017, 5:37 PM
Lydia_Pintscher triaged this task as Low priority.Mar 23 2017, 2:21 PM

Spaces can also cause problems: Obviously a space " " gets encoded as a plus "+". This breaks the links generated for Iconclass notation (P1256). For example http://iconclass.org/11H(COSMAS+&+DAMIAN) generated from "11H(COSMAS & DAMIAN)" instead of http://iconclass.org/rkd/11H(COSMAS%20&%20DAMIAN)/ (from this item).

@Marsupium Have you tried using the wmflabs tool wikidata-externalid-url? I used it for the formatter URL for Twitch game ID (P4467) in September because of the space issue, and it worked without any changes to the actual Toolforge code.

@Marsupium Have you tried using the wmflabs tool wikidata-externalid-url? I used it for the formatter URL for Twitch game ID (P4467) in September because of the space issue, and it worked without any changes to the actual Toolforge code.

Yes, I found that workaround in the end and then forgot to report here. Thanks for mentioning it!

Abbe98 added a subscriber: Abbe98.May 25 2019, 7:47 PM

Does the frontend need to URL encode external IDs? If they are meant to be resolvable there should be no need to do so in the first place right?

Abbe98 added a comment.Dec 5 2019, 8:34 PM

Turns out this is also a problem within WDQS. When it comes to IDs and URIs is there a reason for encoding this type of information in the first place instead of leaving that to applications?

Abbe98 moved this task from Inbox to Watching on the User-Abbe98 board.

https://www.wikidata.org/wiki/Property:P2000 is another one that's affected by the incorrect encoding of spaces to '+'. Given that these values are strings like "Cantigas de Santa Maria", which generates a failing link of http://www1.cpdl.org/wiki/index.php/Cantigas+de+Santa+Maria, it probably fails more often that it works. It should be encoded as %20, which would work everywhere.

Capmo added a subscriber: Capmo.Mar 25 2020, 1:48 PM