Page MenuHomePhabricator

Take account of date precision when displaying dates in WDQS GUI
Open, MediumPublic

Description

Currently the WDQS GUI is displaying all dates as a full day-month-year.

But this is misleading when dates have less precision, eg only the year or month+year is given.

A date of 1974 should not be displayed as "Jan 1, 1974" when that precision is not in the database and the true day-month-year may actually have been "Apr 1, 1974"

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The problem here is that date and precision are separate data items now. I.e. the triple that has the date does not have precision, so it's impossible to know which recision the date had in Wikidata. It may be possible with T92009, but that brings other complications (as not all tools handle comparisons between different types properly) and only covers some precisions.

Smalyshev triaged this task as Medium priority.Feb 27 2017, 6:06 PM
Smalyshev moved this task from Incoming to GUI on the Wikidata-Query-Service board.

So if instead one wrote a query to return a wikibase:Time,
-eg by using p:P569/psv:P569 instead of wdt:P569
could it be possible for the GUI to pick that up and return an appropriately precisioned time?

So if instead one wrote a query to return a wikibase:Time,

wikibase:Time is a type on a node. A node by itself does not have any value, it's just an URI. So the GUI can't use it. You can write a query that returns date values associated with that node, but I'm not sure it'd be easy for the GUI to figure it out, since the return format is tabular and not treelike.

So is there no way to create any kind of compound type that the GUI can interpret appropriately?
-eg date + precision, here
-or url + linktext, on other tickets

RDF is not really good with compound types... Maybe there's some trick, but usually value is just (string,URI) tuple, with URI being the type. For dates, there are standard URIs for some precisions, but we don't have тоо much flexibility there. I need to think more about dates, maybe there's some way. For URL+text, pretty sure one value can't hold both. Of course there are things like JSON-encoded and otherwise-encoded strings, but I don't want to open that particular can of worms yet.

So something like p:P569/psv:P569 returns a URI, something like wdv:8f6e57348b9035361151ee05475253ef

I'm still not sure why it's not possible (in theory at least) for the GUI to spot the wdv prefix, then look up the 8f6e57348b9035361151ee05475253ef in a lookup-table, to translate it into "Apr 1, 1974" + precision 9.

Similarly for a Wikidata value node (perhaps with a different prefix) containing a URL + linktext.

why it's not possible (in theory at least) for the GUI to spot the wdv prefix, then look up the 8f6e57348b9035361151ee05475253ef in a lookup-table

There's no lookup table. The GUI could make a SPARQL query each time it encounters a wdv: node, but I think that'd be a bit expensive performance-wise. Also, result containing value nodes would be useless without GUI, since value node ID does not contain any useful information by itself. Maybe GUI could combine information from several fields' values but I'm not sure how to describe to it which values to use.

Similarly for a Wikidata value node (perhaps with a different prefix) containing a URL + linktext.

There's no such value nodes as far as I know. Which nodes do you mean?

OTOH, wdv: node content may be looked up via LDF, which is much cheaper than SPARQL... Still not sure about performance impact though, it still would require a HTTP roundtrip.

Re: my previous comment, nodes that can store both a string and a URL would appear to be necessary to enable T121274 "Provide an RDF mapping for external identifiers" -- even though there are no such value nodes at the moment.

As one possible hack that could be a way forward on this, I see that the Blazegraph GeoSpatial extension (description on BlazeGraph wiki) allows custom compound data-types to be defined -- which do not necessarily need to include a geospatial element.

One could therefore imagine creating a date-with-precision datatype, with the precision perhaps appended to the date-time string as a final /integer, eg 2017-01-30T00:00:00Z/11, as used elsewhere on the project.

One could also imagine a SERVICE to cast a date node to this datatype -- so that the database would remain pure RDF; but, if users wanted, they could cast a date to this format, which could then be picked up and suitably interpreted by the GUI.

A similar mechanism could also be implemented for links-with-linktext (T150937), and quantities-with-units.

I am wary of (ab)using geospatial types as freeform containers for something that is not geospatial data. Geospatial data is indexed in a special way, and I'm not sure putting other data there is a good thing. Also, it will require custom RDF types which other tools will have hard time to process. So not sure this method is the best solution.

Casting through service is not the way I have explored. Function would probably suit it better, I'll think about what is possible there.

Hi Smalyshev, thanks for the comment; but if I can come back on your two objections:

(i) If the only thing to cast to the bespoke type was a SERVICE (or indeed a function), then there would be no items of the bespoke type in the triplestore, so it would have no implications for indexing.

Besides, the BlazeGraph documentation itself presents an "example, where we define a datatype without geospatial component" -- so it is clearly an application they envisage, and even advocate.

(ii) Again, if there were no items of bespoke type in the triplestore or the RDF dumps, this would limit the implications for external tools.

Secondly, the bespoke type would only appear if a user script had included the SERVICE or function to require it -- without this, the query would just return columns of plain vanilla literals.

Thirdly, what external tools will get depends on how the GUI layer serialises output of queries that do include such a service or function -- one approach might just be to output such compound types as a double column, at least for SELECT queries. (I guess the more challenging issue might be appropriate returns for CONSTRUCT queries).

But I do think it would be a really very valuable addition to the GUI to be able somehow to present links as links, rather than being forced to choose between URLs or unlinked strings, particularly for queries trying to present a lot of data in many columns without the GUI abandoning the column format -- succinct links rather than lengthy unbreakable URLs can be a real advantage there.

And I do think that somehow finding a way to reflect date precision is important to solve, too.

Has the date for century been raised and resolved?

Screen Shot 2022-01-13 at 9.37.53 AM.png (119×525 px, 11 KB)

The "14. century" in inception property translate to 1300 to 1399. Output from query shows 1400 (to me this is the start of 15th century), e.g., https://w.wiki/4deK. Another question, shouldn't the dates be inclusive, 1300 to 1399 instead of January 1, 1400? Thank you very much.

--Jackie

The "14. century" in inception property translate to 1300 to 1399. Output from query shows 1400 (to me this is the start of 15th century)

Wikibase disagrees and considers the 14th century to begin 1301 and end 1400, see Help:Dates § Precision.

Epidosis subscribed.

This task was discussed in the Bug Triage Hour at the Wikidata Data Quality Days 2022.

Over on T207705 "Implement the Extended Date/Time Format Specification" (contribution), I have suggested how I think we might substantially solve this ticket, realistically and achievably, by

  • (i) adding EDTF awareness to the wdqs output gui, and
  • (ii) adding an EDTF-valued form of triples to the RDF dump of date statements

EDTF would also be a very useful output format for complex dates in its own right