Page MenuHomePhabricator

WDQS date handling produces errors for Julian dates
Open, MediumPublic

Description

WDQS holds dates as xsd:dateTime using the proleptic Gregorian calendar. - https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Time

Julian calendar dates entered into Wikidata are stored as such in JSON.

They are converted to proleptic Gregorian calendar dates for WDQS

However, WDQS stores the original - Julian - calendar as the value for wikibase:timeCalendarModel

It follows that a Julian date entered and displayed as expected in Wikidata - such as https://www.wikidata.org/wiki/Q16931292#P571 - 22 June 1498 - is in WDQS represented by the following value set:

simplevalue (ps:P571) - 1 July 1498
value (psv:P571/wikibase:timeValue - 1 July 1498
calendar (psv:P571/wikibase:timeCalendarModel) - wd:Q1985786 (proleptic Julian calendar)

By any reasonable definition, this is an error. WDQS is representing the value as 1 July 1498 Julian, when, at best, it should be 1 July 1498 Gregorian, and ideally should be 22 June 1498 Julian.

I think this date handling needs a rethink, perhaps along the line of:

BDD
given: Julian date in wikidata
when: WDQS reports on the date
then:

  • ps:Pnnn value should be the Julian date - 22 June 1498
  • psv:Pnnn/wikibase:timeValue should be the Julian date - 22 June 1498
  • psv:Pnnn/wikibase:timeCalendarModel) should be wd:Q1985786 (proleptic Julian calendar)
  • psn:Pnnn/wikibase:timeValue should be the Gregorian date - 1 July 1498
  • psn:Pnnn/wikibase:timeCalendarModel) should be wd:Q1985727 (proleptic Gregorian calendar)

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I don’t see the problem. The exact value is "1498-07-01T00:00:00Z"^^xsd:dateTime (query – the datatype is crucial). XSD date and time values follow ISO 8601, which requires the Gregorian or proleptic Gregorian calendar. So the datatype already tells you how to interpret the value, and the wikibase:timeCalendarModel is therefore the calendar model of the value before it was converted to xsd:dateTime. When the RDF exporter encounters a date that it can’t convert to xsd:datetime, it emits a plain string instead – then, and only then, is the wikibase:timeCalendarModel also the calendar model of that string.

Wikibase or the query service aren’t “representing the value as 1 July 1498 Julian” – they’re just representing the value as some triples. It’s your interpretation of those triples that’s flawed, as far as I understand: “1 July 1498 (proleptic Gregorian), originally specified as Julian” would be a closer English rendition of them, I believe.

It might still be useful to also include the original, unconverted value in the RDF export (as a string, not an xsd:dateTime). But I don’t think there’s anything wrong with the current representation.

Hello. I just ran into the same issue.

For comparison, look at JSON instead: http://www.wikidata.org/entity/Q16931292.json
There, the "value" object contains the following fields:

time: "+1498-06-22T00:00:00Z"
calendarmodel: "http://www.wikidata.org/entity/Q1985786"

This means that JSON contains the original value, together with the calendar model to which the value corresponds.

I do understand the logic provided by @Lucas_Werkmeister_WMDE. However, as developer, I missed the fact that in RDF, the wikibase:timeValue field contains the converted-to-Gregorian value instead of the value corresponding to the calendar - contrary to what JSON does.

As developer, in certain situations, I'd like access to the original - unconverted - value, to faithfully extract the value entered in the user interface. This does not seem possible in RDF at the moment.

I propose to add a field wikibase:timeValueCorrespondingToOriginalCalendar (better name to be discussed).

That addition would make sense to me too (maybe we should edit the task title and description). We should probably only define it for time values where that string differs from the string of the wikibase:timeValue, to save space; people who always wanted the original string could then use a snippet like this:

?timeValue wikibase:timeValue ?converted.
OPTIONAL { ?timeValue wikibase:timeValueOriginal ?original_. }
BIND(COALESCE(?original_, STR(?converted)) AS ?original)

That said, currently about 20% of time values are non-proleptic Gregorian – 71095 out of 339705 (query), a larger proportion than I expected – so maybe we should just add the triple to all time values and it wouldn’t take up that much extra space? @Gehel any thoughts on this?

A way to get access in WDQS to "original calendar" values would really be helpful.

At the moment, we're in an odd situation where the dates in WDQS are technically correct by ISO 8601 (I think), but most users interested in looking at historic data won't be familiar with the convention of ISO 8601, and will either use them without realising they're not being displayed in the calendar they expect, or spot that they're all wrong and get confused/irritated/etc. (Worse, people may change dates to be incorrectly marked as Gregorian in order to make them "show up" correctly. I haven't seen much of this, thankfully, but I am sure it does happen).

I don't think defaulting to Gregorian is a problem in and of itself, but we do need a way to bypass it. Ideally, I think, what WDQS would be able to do is:

  • if asked, display a date in its original calendar schema, and tell you what that calendar is (either as an additional value or with something like the WD superscript)
  • if asked, render a date in a specified calendar schema, either Julian or Gregorian, and tell you which one is being displayed (either as an additional value or with something like the WD superscript)

The first of these is the key problem, as it affects every pre-1582 date; for these, WDQS cannot currently display a human-readable date that is what a human would expect.

The second would be really helpful during periods where some countries use it and some don't, as there will be queries where you're legitimately expecting a mix of calendars in the responses and (depending on context) may want to standardise a timeline on Julian rather than Gregorian. But it's not as vital.

Gehel triaged this task as Medium priority.Sep 15 2020, 8:04 AM