Page MenuHomePhabricator

Lowercase Qs in some statement URLs
Closed, DeclinedPublic

Description

Some statements created in the early days of Wikidata have lowercase q in the URL, when only uppercase Qs are expected. Example: https://www.wikidata.org/wiki/Q1545140#q1545140$9193059E-B7E5-4892-9196-87F8CFBB4AB8

Investigation tasks:

  • Estimate how many of these are left
  • How complicated it would be to fix it
  • Should we fix it? If we do, the GUIDs will change, which is not so good for traceability and stability

Mentioned on WD:DEV in October 2020

Event Timeline

In my opinion, we should not fix these – statement IDs should be stable, and the lowercase doesn’t really hurt anybody – and so I wouldn’t even spend time investigating how many such statement IDs we have.

How complicated it would be to fix it

It is technically possible and probably not to hard to "fix" them.

Should we fix it? If we do, the GUIDs will change, which is not so good for traceability and stability

In an ideal world they should remain stable, and moving from lowercase to uppercase for these old guids will change them.
IMO I don't see the real issue of this legacy

Ok based on what Adam and Lucas are saying I'd say let's reject it if it's just the cosmetics of it that are the problem. If this is causing other issues please reopen.

https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Full_statements clearly states: "There is no guaranteed format or meaning to the statement id." So why change an ID that should be stable when no specific format is guaranteed? Nobody should rely on any meaning in the IDs as implementation details such as this should be possible to change as need arise.

https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Full_statements clearly states: "There is no guaranteed format or meaning to the statement id." So why change an ID that should be stable when no specific format is guaranteed? Nobody should rely on any meaning in the IDs as implementation details such as this should be possible to change as need arise.

Indeed, we could change statement IDs to start being for example long integers, or having more characters, and that in theory should not be a problem, as there is no stability guarantee in that respect.
However when statements change, the GUID remains the same. It doesn't seem that this is really written down anywhere, but it is true in code and I believe intentional by design.

It also worth adding a reference to T244207#5954555.

Here you can see that at least some of the people working on Wikibase don't subscribe to the idea that the entityId in the GUID does correspond to the entity on which the statement sits (e.g. after deletion and restoration).

So one shouldn't rely on parsing the GUID as a way to reliably determine in the entityId. We probably wouldn't want to inadvertently give the impression that one should do this.