Page MenuHomePhabricator

URL-encoding of external-id values in Wikidata frontend breaks (some) links
Closed, ResolvedPublic

Description

Values of external-id properties link to the external database entry from the Wikidata frontend. In some cases the external weblink is not working due to URL-encoding of special characters within the external identifier, such as %, &, =, and maybe others. An example is https://www.wikidata.org/wiki/Q325887#P3520 with the correct identifier W%D6LLEKLA01. The extra URL-encoding translates it to W%25D6LLEKLA01 which is not working.

I originally requested an update at wikidata:MediaWiki talk:Gadget-AuthorityControl.js, but I learnt that this gadget is no longer responsible for the linking.

Event Timeline

Spaces can also cause problems: Obviously a space " " gets encoded as a plus "+". This breaks the links generated for Iconclass notation (P1256). For example http://iconclass.org/11H(COSMAS+&+DAMIAN) generated from "11H(COSMAS & DAMIAN)" instead of http://iconclass.org/rkd/11H(COSMAS%20&%20DAMIAN)/ (from this item).

@Marsupium Have you tried using the wmflabs tool wikidata-externalid-url? I used it for the formatter URL for Twitch game ID (P4467) in September because of the space issue, and it worked without any changes to the actual Toolforge code.

@Marsupium Have you tried using the wmflabs tool wikidata-externalid-url? I used it for the formatter URL for Twitch game ID (P4467) in September because of the space issue, and it worked without any changes to the actual Toolforge code.

Yes, I found that workaround in the end and then forgot to report here. Thanks for mentioning it!

Does the frontend need to URL encode external IDs? If they are meant to be resolvable there should be no need to do so in the first place right?

Turns out this is also a problem within WDQS. When it comes to IDs and URIs is there a reason for encoding this type of information in the first place instead of leaving that to applications?

https://www.wikidata.org/wiki/Property:P2000 is another one that's affected by the incorrect encoding of spaces to '+'. Given that these values are strings like "Cantigas de Santa Maria", which generates a failing link of http://www1.cpdl.org/wiki/index.php/Cantigas+de+Santa+Maria, it probably fails more often that it works. It should be encoded as %20, which would work everywhere.

I report here my issue: "As reported in https://www.wikidata.org/wiki/Property_talk:P9112#ID_IFLA_link_to_an_error_page, the link generated by https://www.wikidata.org/wiki/Property:P9112 always encodes # as %23 ( e.g. in https://www.wikidata.org/wiki/Q719309#P9112 links to https://www.iflastandards.info/unimarc/terms/key%23ab ), causing in fact a wrong link." Are we sure this is "low" priority? Thanks in advance!

Lydia_Pintscher raised the priority of this task from Low to Medium.Feb 12 2021, 5:12 PM
Lydia_Pintscher added a subscriber: Lydia_Pintscher.

No, Given the number of reports let's raise it.

Lydia_Pintscher claimed this task.

I believe the changes in T271126 fixed this. If you still find cases that don't work please reopen.