Page MenuHomePhabricator

Custom WDQS prefixes based on dashboard prefix option
Open, HighPublicFeature

Description

Feature summary (what you would like to be able to do and where):

I want to set a prefix in the dashboard that'll be used to generate and apply local standard WDQS prefixes. This would be pre-populated with a name-based suggestion for new instances, but could be modified afterwards.

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

When you want to write queries on Wikibase.cloud currently, you have to specify your own prefixes, or else use a full URI.

Default Wikibase-specific prefixes are from Wikidata; some of these are added when the "Add Standard Prefixes" option is selected in the Prefixes 📌 menu.

Some users have taken to manually redefining standard prefixes such as wdt: or p:. The autosuggestion CTRL-SPACE after a prefix is entered also seems to assume that this will be done. However this is bound to lead to confusion when others are reading or writing queries, or when it comes to federated queries, and is not in line with how other well-known prefixes are treated.

Instead, I suggest creating new prefixes based on a master prefix associated with the instance, to replace 'wd' where it was in a prefix already, or else add to it; e.g. if the site prefix is 'wfd' for WikiFur Data then 'wdt' becomes 'wfdt', while 'p' becomes 'wfdp'.

These wouldn't necessarily be unique, though it may be worth considering this in order to facilitate future federated queries between Wikibase.cloud instances. (Perhaps just warn that it is shared by X wikis if naming then is a privacy concern.)

Prefixes could be suggested based on capital letters in the name, as well as the first letter after a space or hyphen or the start of the name, in case they don't use capitals.

This concept could be introduced through the onboarding workflow, and added or reconfigured later in the dashboard for existing Wikibase.cloud instances.

Benefits (why should this be implemented?):

By doing this, there would be no more confusion about what 'wdt:' means on Wikibase.cloud, or what prefixes are required to write a query. By placing it in onboarding, perhaps using an example with the prefix, users will be introduced to the concept of using local prefixes for queries rather than Wikidata ones.

Existing prefixes would continue to work, while new ones can be remembered easily if you know the prefix, or if you add the standard prefixes to a query, since they'd be there in addition to the ones for Wikidata. (Ideally, related issues T296451, T317109 and Cradle's wdt: dependency would be addressed too. See also T211799 for the more general case of prefixes in the Wikibase front-end.)

Event Timeline

In a lot of ways, I'd be completely fine sticking with the Wikidata standard prefixes pointed at a specific wb.c instance's domain/paths. Or perhaps a slight tweak from "wd" to "wb". The big conceptual thing is distinguishing between terms/concepts that are a part of the Wikibase and larger frameworks (owl, rdfs, prov, etc.) and then the specific knowledge graph being queried. All Wikibase SPARQL queries share common characteristics because of the statement/reference/qualifier dynamic, and many of the examples online (and generative via LLMs) are going to refer to Wikidata, so it's reasonable to keep it s short step between those examples and what the query would look like for a specific Wikibase instance.

On the other hand, shorthand namespaces can also take on useful meaning through time as a given domain is providing unique content and functionality through a given graph. A solution might be to simply use the built-in wd-focused prefixes pointed to the local domain as a default but let the owners of a wb.c instance choose their own adventure. I would make the configuration for that down at the level of each individual prefix, though, as opposed to something like specifying a particular acronym/initials and then stubbing it on. For instance, if I want "http://my-wb.c/prop/statement/" to be referred to as "statementprop:" for some reason, then I should be able to do that.

The other thing that would be nice here is to change the routing to allow for http vs https addresses on resolvers. The behavior in WDQS seems to be to simply display the identifier when the namespace designation for a node uses HTTP and display the full URL/URI when the namespace resolver points to an HTTPS address. Or this could be something to do with the built-in namespaces (I haven't looked under the hood). At any rate, when returning a query result with a bunch of stuff, including a local graph pointing to several different types of nodes, anything we can do to trim up the visual display of the node identifiers would be helpful.

I agree having a one-click setting for the base of a prefix would be useful. At the very least, we should dissuade anyone from using wd: wdt: et al. as is done in Wikidata. That's only going to lead to more confusion, as it is highly likely that anyone who is using Wikibase will likely have Wikidata experience and will be interested in federating with Wikidata. Having "wd:" be ambiguous out of the box seems like bad practice. On more than one occasion, I've had to coach a GLAM organization to not keep the wd: prefix, as we don't have great documentation or elaboration around the pluses and minuses of this decision.

Some thoughts:

  • I would also suggest testing any user-specified prefix choice against a well-known list of prefixes to warn them of any potential collisions: https://prefix.cc/popular/all.sparql
  • Some of the key prefixes you absolutely do not want the user to collide with, like owl, skos, or schema. But for others, you might want to give a soft warning that they are potentially overlapping with an existing service.
  • I would suggest a standard template based on what Structured Data on Commons uses, which is to use "sdc" as the base part of all the prefixes.

@prefix sdc: https://commons.wikimedia.org/entity/ .
@prefix sdcdata: https://commons.wikimedia.org/wiki/Special:EntityData/ .
@prefix sdcs: https://commons.wikimedia.org/entity/statement/ .
@prefix sdcref: https://commons.wikimedia.org/reference/ .
@prefix sdcv: https://commons.wikimedia.org/value/ .
@prefix sdct: https://commons.wikimedia.org/prop/direct/ .
@prefix sdctn: https://commons.wikimedia.org/prop/direct-normalized/ .
@prefix sdcp: https://commons.wikimedia.org/prop/ .
@prefix sdcps: https://commons.wikimedia.org/prop/statement/ .
@prefix sdcpsv: https://commons.wikimedia.org/prop/statement/value/ .
@prefix sdcpsn: https://commons.wikimedia.org/prop/statement/value-normalized/ .
@prefix sdcpq: https://commons.wikimedia.org/prop/qualifier/ .
@prefix sdcpqv: https://commons.wikimedia.org/prop/qualifier/value/ .
@prefix sdcpqn: https://commons.wikimedia.org/prop/qualifier/value-normalized/ .
@prefix sdcpr: https://commons.wikimedia.org/prop/reference/ .
@prefix sdcprv: https://commons.wikimedia.org/prop/reference/value/ .
@prefix sdcprn: https://commons.wikimedia.org/prop/reference/value-normalized/ .
@prefix sdcno: https://commons.wikimedia.org/prop/novalue/ .

Anton.Kokh triaged this task as Medium priority.Tue, Jun 11, 3:27 PM
Anton.Kokh raised the priority of this task from Medium to High.Fri, Jun 14, 3:22 PM