Page MenuHomePhabricator

Consider switching to HTTPS for Wikidata query service links
Closed, DeclinedPublic

Description

The Wikidata query service results at https://query.wikidata.org/#%23Cats%0ASELECT%20%3Fitem%20%3FitemLabel%0AWHERE%0A%7B%0A%09%3Fitem%20wdt%3AP31%20wd%3AQ146%20.%0A%09SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22%20%7D%0A%7D include URLs such as http://www.wikidata.org/entity/Q25393350. Can these URLs be changed to HTTPS? For example, https://www.wikidata.org/entity/Q25393350.

I looked at the "wikidata-query-gui" and "wikidata-query-rdf" Git repositories, but I'm not sure if changing the URL scheme/protocol is just a simple matter of doing a find and replace or if the specification format is particular about using http://.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

We previously discussed this and the tradition for entity identifiers is to use http. E.g. such commonly known prefixes as:

rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs: http://www.w3.org/2000/01/rdf-schema#
xsd: http://www.w3.org/2001/XMLSchema#
schema: http://schema.org/
cc: http://creativecommons.org/ns#
owl: http://www.w3.org/2002/07/owl#

Keeping in tune with this traditional use, our entity identifiers also use http. Note that entity identifier is not the same as URL - while we do implement site access using the entity identifier as URL for some objects - namely entities - not every object is accessible by using it's identifier as URL, and in general it is not guaranteed that object identifier will produce any specific result or any result at all when accessed as URL. E.g. value identifiers do not produce anything useful - if you go to http://www.wikidata.org/value/294352e4fa24cbfe51f337c80290fb2d you will be redirected to a generic help page.

Thus, I do not think we should change our identifier scheme to diverge from what is used in every other linked data application.

We previously discussed this and the tradition for entity identifiers is to use http.

I searched this installation of Phabricator for an existing task before filing this one, for what it's worth. Perhaps the discussion took place on a mailing list or elsewhere. Links (or URLs or identifiers or whatever you want to call them) welcome.

Keeping in tune with this traditional use, our entity identifiers also use http. Note that entity identifier is not the same as URL - while we do implement site access using the entity identifier as URL for some objects - namely entities - not every object is accessible by using it's identifier as URL, and in general it is not guaranteed that object identifier will produce any specific result or any result at all when accessed as URL. E.g. value identifiers do not produce anything useful - if you go to http://www.wikidata.org/value/294352e4fa24cbfe51f337c80290fb2d you will be redirected to a generic help page.

Okay, good to know about identifiers generally. I'm not sure I agree with this decision. Why use http:// and www. in the identifier when these so clearly suggest a URL, available over HTTP, and part of the Web? If this is supposed to be a generic identifier of an item/object/entity, why include these parts?

Thus, I do not think we should change our identifier scheme to diverge from what is used in every other linked data application.

A foolish consistency is the hobgoblin of little minds. ;-)

In this specific context, the query service is outputting URLs (yes, URLs, right here in River City) such as this one:

<a title="" href="http://www.wikidata.org/entity/Q25393350" target="_blank">wd:Q25393350</a>

Even if the identifier is going to use http://, the URLs output in the HTML DOM can be changed to https:// in the HTML we serve to users in order to save their clients an extra hop, right?

For the reasons WMF uses https shouldn't we make sure that users don't access http ?

If people are given sufficient advance notice, I think this should be changed.

@Esc3300 please see above about the difference between entity identifiers and URLs.

@MZMcBride if you mean links that appear in the GUI, I think yes, those can be changed.

As Smalyshev mentions, traditionally http may be used, but there isn't really a rule against using https. Traditionally most of the internet used http, not https, but WMF decide against its use.

As people do click on these, it's important that http is NOT used.

FWIW, there was also a bit of discussion about this on PC following the initial WDQS Beta announcement: https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2015/09#Wikidata_Query_Service

It’s also worth noting that, thanks to HSTS, your browser will automatically upgrade the connection to HTTPS without transmitting anything over HTTP, as long as you’ve visited wikidata.org at least once in the last year (31536000 seconds).


2017-08-22 edit: wikidata.org is actually on the HSTS preload list (check here), so AFAIU no modern browser should ever load it over HTTP even if the domain hasn’t been visited in over a year (or ever).

I think it would be good to see some metrics. How many people use these links? Is this already available on grafana?

T153897 seems to illustrate the problem with not using http.

@Esc3300 no, that's completely different problem, actually a configuration bug having nothing to do with this one :) See the attached patch.

Good point. So that all links on https://ldfclient.wmflabs.org/ point to http:// is a different problem.

Is it advisable to use statements like strafter(str(?item), str(wd:)) to avoid hard-coding URI prefixes within WDQS consumers?

@Ricordisamoa I don't think reparsing object identifiers is a good idea, unless you need it for some specific purpose. Hard-coding URI-prefixes is kind of the point of it - each object has its own globally unique identifier, which is the full URI.

each object has its own globally unique identifier, which is the full URI.

Naïve question: does it necessarily have to be a single identifier? If each object had two globally unique identifiers (http and https), which problems would arise?

@MisterSynergy using two ids means each time you want to query something related to this object, you need to do 2 queries. If two objects are involved, it's 4 queries. You can see where it's going. I don't think it is a good idea.

However, we can (and actually already do) make both http and https link to point to the same place, and we can (and probably will) make GUI to use https links when you're clicking on object ID. While it's a very good idea to use single object ID it does not mean you can not use multiple URLs to access information about that object. The latter is certainly possible and I see no problem in it.

@Ricordisamoa I don't think reparsing object identifiers is a good idea, unless you need it for some specific purpose. Hard-coding URI-prefixes is kind of the point of it - each object has its own globally unique identifier, which is the full URI.

Should I just do uri.replace('http://www.wikidata.org/entity/', '') and use them as item ids? Please correct me if I'm wrong.

@Ricordisamoa For Wikidata Items, yes, you can do that.

I don't think it's happening. If we decide otherwise in the future, we can reopen, but for now I don't think it makes sense to keep this open.

Just a note that the same discussion came up among linked library data producers at SWIB18 conference. There was a breakout session that triggered some follow-up. Should this topic ever come up in the wikidata community again, we'd love to get in touch!