[Task] Make display of BCE dates in Wikidata Query Service identical to Wikidata
Closed, ResolvedPublic

Description

To simplify reading BC dates on WQS, it would be helpful to be able to display them as on www.wikidata.org GUI. This whatever conversion may currently be applied.

Sample:

-Wikidata: 27 November 8 BCE Julian
-Wikidata Query Service (currently): "-0007-11-25T00:00:00Z"

New display feature:
-Wikidata Query Service: 27 November 8 BC

Esc3300 created this task.Aug 5 2016, 12:53 PM
Restricted Application added a project: Discovery. · View Herald TranscriptAug 5 2016, 12:53 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Jc3s5h added a subscriber: Jc3s5h.Aug 11 2016, 10:02 AM

There is considerable doubt about the correctness of the user interface. I suggest anyone undertaking this must obtain an ironclad definition of the meaning of the exact source WQS is obtaining the data from. Data entered into the database may have been entered with a variety of tools, not just the user interface. There is reason to suspect some of these tools would have treated -0001-01-01 as January 1, 2 BCE, while others would have treated it as January 1, 1 BCE. Thus, looking at examples of well-known BCE dates in the user interface is not a reliable way to judge the definition of the source of data that the user interface pulls from.

Thanks @Jc3s5h for your input.

There is some irritation with BC and negative years.
Therefore we agreed that BC is the better way to display negative years.
Nevertheless in wdqs you will always see the raw date format when hovering over the formatted date.

I tend to favor displaying years before AD 1 with a combination of letters and numbers in the user interface, because different conventions have been used about what negative year numbers mean; no such ambiguity exists with "2 BC" or "2 BCE". I don't care whether AD/BC are used vs. CE/BCE.

But of course the converter can't be written until the notation used in the source the converter is receiving input from is known with certainty.

You use the term "raw value" and explain I can see it if I hover over the date. Can you explain where this raw date comes from? Is it identical to the value stored in the data base, or is some conversion done to it from the time it leaves the data base to the time I see it on the screen? If you can assure me no conversion is done, that would be enough to make me start using WQS on a regular basis. Thanks for your reply

The raw value is the value given by Blazegraph which is the value defined in RDF.
This value is exported from Wikidata JSON serialization using this spec:
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Time

Oh dear. Blazegraph recently went through an upheaval, switching from XSD 1.0 (where year 0000 does not exist and -0002 = 2 BCE) to XSD 1.1 (where year 0000 does exist and -0001 = 2 BCE). Also, there is controversy over whether the author(s) of the RDF formatter have correcty interpreted what is stored in the data base and are converting the year correctly.

I suspect there may be an inactivity timeout in this system, so I will submit this now and add more later.

Jc3s5h added a comment.EditedAug 11 2016, 4:41 PM

I made a chart of what happens around AD 1. All these dates were entered in the sandbox today and all have had the calendar manually set to Gregorian during the entry process. (I accidentally entered 0000-01-22 while taking the default for this, Julian calendar, so I'm ignoring that date.) The diff column is the timestamp value one sees while viewing diffs between different versions of the sandbox.

entereddiffUI displayRDFJSON
1 January 2 BCE-0002-01-011 January 2 BCE-0001-01-01-0002-01-01
8 January 1 BCE-0001-01-088 January 1 BCE0000-01-08-0001-01-08
15 January 1+0001-01-1515 January 10001-01-15+0001-01-15
0000-01-30+0000-01-3030 January 0ignored+0000-01-30
Esc3300 renamed this task from [feature request] add option to display BC dates as on Wikidata to [feature request] add option to display BC dates in Wikidata Query Service as on Wikidata.Aug 11 2016, 5:00 PM
Esc3300 updated the task description. (Show Details)
Esc3300 updated the task description. (Show Details)

Also, there is controversy over whether the author(s) of the RDF formatter have correcty interpreted what is stored in the data base and are converting the year correctly.

Could you please explain what you mean? I'm the author of the RDF format (at least one of :) and I am not aware of any "controversy" - RDF follows XSD 1.1, and it's the industry standard. If you have observed some bugs there please submit a ticket.

But of course the converter can't be written until the notation used in the source the converter is receiving input from is known with certainty.

We can not know with certainty what whoever have put the dates into Wikibase meant. What however we can do is to define how we interpret the user input and stored data. And we have no choice but to do that - we have to put *something* in the output, and by the fact of putting that into output we are defining our interpretation.

Right now the dates in the DB and JSON follow (roughly) the XSD 1.0 standard for serializing dates, but the RDF follows XSD 1.1 since it is accepted standard in semantic data world.

The last date - 0000-01-30 - doesn't seem to be valid, i.e. it does not describe any actual point on the date scale. Wikibase may sometimes allow to enter nonsensical values (including February 30th, etc.), but RDF can not really represent it.

[..] We can not know with certainty what whoever have put the dates into Wikibase meant. [.]

A problem I found with BC dates is that users input years for facts of which only the decade or century are considered certain.

After the revision of the task description around 1700 UT on Aug 11, I see that the if the date was a Julian calendar date, WQS would convert the Gregorian date received from the RDF to the equivalent Julian date, and not mention in the display which calendar this is. I seriously question the decision of the UI authors to make fussy decisions about when to display the calendar and when not to. I think it would be better, and more resilient to any changes in this area in the UI, to always display the calendar.

On a related note, if you attempt to adjust the display according to the precision, you could get into trouble if your not careful. If the date was entered as 4 January 500 BCE Julian calendar, century precision, it would be displayed as 6. century BCE. When RDF converts that to 30 December 501 BCE its now in the 6th century BCE, so if you try to match the display of dates with precision worse than day, you must do the back-conversion to Julian before you try to decide how to display it.

Finally, better make sure your Gregorian to Julian conversion routines works for the same range of dates as the RDF Julian to Gregorian converter does. Neither has a prayer of working for the start date of item Q1 (what does a year even mean when the earth doesn't exist yet)?

Also, there is controversy over whether the author(s) of the RDF formatter have correcty interpreted what is stored in the data base and are converting the year correctly.

Could you please explain what you mean? I'm the author of the RDF format (at least one of :) and I am not aware of any "controversy" - RDF follows XSD 1.1, and it's the industry standard. If you have observed some bugs there please submit a ticket.

But of course the converter can't be written until the notation used in the source the converter is receiving input from is known with certainty.

We can not know with certainty what whoever have put the dates into Wikibase meant. What however we can do is to define how we interpret the user input and stored data. And we have no choice but to do that - we have to put *something* in the output, and by the fact of putting that into output we are defining our interpretation.

Right now the dates in the DB and JSON follow (roughly) the XSD 1.0 standard for serializing dates, but the RDF follows XSD 1.1 since it is accepted standard in semantic data world.

The last date - 0000-01-30 - doesn't seem to be valid, i.e. it does not describe any actual point on the date scale. Wikibase may sometimes allow to enter nonsensical values (including February 30th, etc.), but RDF can not really represent it.

JSON, RDF, the user interface, and the time stamp of the date shown in diffs are presenting seemingly conflicting information to the public. Therefore, they're all wrong. Just like part of a marching band is playing Yankee Doodle, another part is playing Lucy in the Sky with Diamonds, and a third part is playing Deutschlandlied. That band isn't getting invited back next year. First decide what to follow and implement in one of these. Then get a commitment from the other parts to quickly implement corrections and document it all over WikiData, the data models in Wikimedia, etc.

I see that the if the date was a Julian calendar date, WQS would convert the Gregorian date received from the RDF to the equivalent Julian date, and not mention in the display which calendar this is.

This is probably not right, if the day is given and it's outside the ranges of assumed Gregorian calendar, it should display calendar. However, all RDF dates are in fact in proleptic Gregorian calendar, and I'm not sure it's worth converting them back to Julian. Yes, that means they won't match Wikidata display, but is matching Wikidata display actually a priority?

in this area in the UI, to always display the calendar.

Since most dates will be Gregorian, I doubt always displaying calendar is beneficial.

Neither has a prayer of working for the start date of item Q1 (what does a year even mean when the earth doesn't exist yet)?

I assume it's proleptic Gregorian calendar. With dates less precise than a day, the question of calendar is not meaningful, and such dates are not converted. So Q1 is not converted, and the question of calendar conversion is irrelevant for it.

We can not know with certainty what whoever have put the dates into Wikibase meant. What however we can do is to define how we interpret the user input and stored data. And we have no choice but to do that - we have to put *something* in the output, and by the fact of putting that into output we are defining our interpretation.

True. What we can do is decide what the dates should mean and then correct all the methods of putting data in to reject invalid dates. It would be nice if all public-facing interfaces always required dates like 0000-01-01 or the equivalent 1 January 1 BC (with suitable synonyms for the words & numbers format). Also adjust all displays to either refuse to display invalid dates, or display them together with an indication they are invalid.

In a test I did, I noticed that RDF, for a date I stored in the second sandbox by entering 0000-01-30, Gregorian calendar, because it didn't conform you you're interpretation of what the internally stored dates meant. Bravo! Keep it up.

JSON, RDF, the user interface, and the time stamp of the date shown in diffs are presenting seemingly conflicting information to the public. Therefore, they're all wrong.

No, they are not wrong. They are using different formats. The format for RDF data is XSD 1.1.

Since most dates will be Gregorian, I doubt always displaying calendar is beneficial.

Either make sure every date displayed is Gregorian, or make sure the calendar is displayed. Otherwise the person who used to working with Gregorian but one day has to work with Julian will be confused as hell. Imagine if a poor German bought a car that accidentally an odometer that worked in miles.

I see the documentation of the JSON data model has been revised today; see this version from mediawiki. So now we have different formats, but our documentation now explains to users that these formats are intended to be different. That's a lot better. Now a few tweaks are still needed to make various parts reject or vilify dates they consider invalid, but this is a big step forward.

thiemowmde triaged this task as Low priority.Aug 12 2016, 10:05 AM

Change 304835 had a related patch set uploaded (by Thiemo Mättig (WMDE)):
Rewrite _formatDate to support the full date range

https://gerrit.wikimedia.org/r/304835

thiemowmde renamed this task from [feature request] add option to display BC dates in Wikidata Query Service as on Wikidata to [Task] Make display of BCE dates in Wikidata Query Service identical to Wikidata.
thiemowmde claimed this task.
Jonas closed this task as Resolved.

Change 304835 merged by jenkins-bot:
Rewrite _formatDate to support the full date range

https://gerrit.wikimedia.org/r/304835