Page MenuHomePhabricator

(MS 6) Query for date values
Closed, ResolvedPublic13 Estimated Story Points

Description

As an editor I want to query for data containing dates in order to find the events. For example: querying with property point in time

Problem:
Right now we do not allow querying for date values

Example:
Querying for point in time with value say 20-01-2021 that would return the inauguration of Joe Biden.

mockups:

image.png (726×879 px, 56 KB)

BDD
GIVEN a visual query
WHEN selecting a property having data type time
THEN the editor can enter date value

Acceptance criteria:

  • result is displayed when querying for a Property with data type time

Notes:

Query Examples:
Something happened on a day https://w.wiki/34zC:

# day -> precision === 11
SELECT ?item ?itemLabel
WHERE
{
  ?item p:P585/psv:P585/wikibase:timeValue "+1789-07-14T00:00:00Z"^^xsd:dateTime.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}

something happened in a month https://w.wiki/34zD:

# month -> precision === 10
SELECT ?item ?itemLabel
WHERE
{
  ?item p:P585/psv:P585 ?timeValue.
  
  # has at least the given precision i.e. month
  ?timeValue wikibase:timePrecision ?precision. hint:Prior hint:rangeSafe true. # important for performance
  FILTER(?precision >= 10) # month precision
  
  # points to the correct month
  ?timeValue wikibase:timeValue ?dateTime. hint:Prior hint:rangeSafe true. # important for performance
  FILTER("1789-07-00"^^xsd:dateTime <= ?dateTime &&
         ?dateTime < "1789-08-00"^^xsd:dateTime) # must be manually calculated
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}

something happened in a year https://w.wiki/34zE:

# year -> precision === 9
SELECT ?item ?itemLabel
WHERE
{
  ?item p:P585/psv:P585 ?timeValue.
  
  # has at least the given precision i.e. year
  ?timeValue wikibase:timePrecision ?precision. hint:Prior hint:rangeSafe true. # important for performance
  FILTER(?precision >= 9) # year precision
  
  # points to the correct month
  ?timeValue wikibase:timeValue ?dateTime. hint:Prior hint:rangeSafe true. # important for performance
  FILTER("1789-00-00"^^xsd:dateTime <= ?dateTime &&
         ?dateTime < "1790-00-00"^^xsd:dateTime) # must be manually calculated
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}

TODO:

  • how does the query look like for things that did not happen on a given date/month/year?
    • this doesn't make sense to query, but it can be built with the query builder so we need to be able to create it as a SPARQL query

Event Timeline

Lydia_Pintscher renamed this task from (MS 6) Input date values to (MS 6) Query for date values.Jan 26 2021, 6:55 PM
Lydia_Pintscher updated the task description. (Show Details)

Assigning to Charlie for adding mockups

Hey over here. I'm writing to warn you that, unfortunately, this can be extremely problematic. Since Wikibase stores all dates with day precision, regardless of their actual precision (which is stored separately), and the irrelevant part of the value can be arbitrary, it's very easy to generate misinformation with these features, as has been quietly happening for years and continues to happen to many agents and to at least three Property constraint types (see T168379, one of the most important causes of false positives in the very system that is intended to ensure data quality; and see https://reasonator.toolforge.org/?q=Q100410179 mentioning the year 1476, totally arbitrary value that Wikibase stores internally and shamelessly communicates to all software agents but does not display in its web interface). Including these features in the Query Builder aimed at the general public without first resolving this problem could have very undesirable consequences. I should have opened a Phabricator task describing this general problem and all its consequences, but I have never found the time or known where to start because of its magnitude and the questions it opens up.

Hey, @abian. Those are very valid and concerns, and we really appreciate you sharing them! We are aware of the fact that unfortunately some current issues will also be reproduced by the new application (e.g. querying for a specific date like 01-01-2021 will result in false positives, since 2021 with precision year will also be retrieved). In the context of the Query Builder, users will only be able to enter specific dates, and precision will be interpreted based on their input (we are yet to define how some values – e.g. years only – would be translated into SPARQL. This might be the key to alleviate some of the inherent issues with dates?). This data type is definitely a tricky one, and we wouldn't like to proceed being unaware of big problems. Since you seem to really have a deep understanding of the issue and its consequences, it'd be great if you'd be up for a chat?

Hey, @abian. Those are very valid and concerns, and we really appreciate you sharing them! We are aware that unfortunately some current issues will also be reproduced by the new application (e.g. querying for a specific date like 01-01-2021 will result in false positives, since 2021 with precision year will also be retrieved). In the context of the Query Builder, users will only be able to enter specific dates, and precision will be interpreted based on their input (we are yet to define how some values – e.g. years only – would be translated into SPARQL. This might be the key to alleviate some of the inherent issues with dates?). This data type is definitely a tricky one, and we wouldn't like to proceed being unaware of big problems. Since you seem to really have a deep understanding of the issue and its consequences, it'd be great if you'd be up for a chat?

Of course we can chat. :-) I'll send you an email.

Michael renamed this task from (MS 6) Query for date values to (MS 6) 🛑 Query for date values.Mar 9 2021, 1:17 PM
Michael updated the task description. (Show Details)
Michael updated the task description. (Show Details)

We only support 3 degrees of precision: day, month, year

Can we clarify why only those 3?

We only support 3 degrees of precision: day, month, year

Can we clarify why only those 3?

IIRC, because you said so :)

Michael renamed this task from (MS 6) 🛑 Query for date values to (MS 6) Query for date values.Mar 19 2021, 9:57 AM

We only support 3 degrees of precision: day, month, year

Can we clarify why only those 3?

IIRC, because you said so :)

I don't remember that 😬 And from a user-perspective I don't think there is a reason to restrict it. Are there technical reasons?

We only support 3 degrees of precision: day, month, year

Can we clarify why only those 3?

IIRC, because you said so :)

I don't remember that 😬 And from a user-perspective I don't think there is a reason to restrict it. Are there technical reasons?

Not per se, though it is some extra work associated with each extra level of precision. The reason is that each precision needs some custom handling when generating the SPARQL. E.g. "15th century" is stored roughly as 1500-00-00 and needs to be transformed into > 1400-00-00 and <= 1500-00-00. This effort isn't huge, but is also not just a config change.

Change 678282 had a related patch set uploaded (by Michael Große; author: Michael Große):

[wikidata/query-builder@master] Ensure day precision for day values

https://gerrit.wikimedia.org/r/678282

When verifying, @Sarai-WMDE noticed that we are getting year-precision results for day values: https://w-beta.wmflabs.org/HW

The patch above should fix this.

I also noticed that we're not getting any results when queries only contain "month-year" or "year" date values (I'm assuming we want to allow this). For example, this query should find at least the First Balkan war. This applies to "matching" relations too, not only ranges.

Maybe the exclusion has to do with attaching year precision to incomplete values formatted in xsd:dateTime? Maybe we should use YEAR(?time) with precision>=9 in case only years are entered, or translate the input to ranges of xsd:dateTime values instead

Change 678567 had a related patch set uploaded (by Michael Große; author: Michael Große):

[wikidata/query-builder@master] Fix not selecting anything with month or year precision

https://gerrit.wikimedia.org/r/678567

I also noticed that we're not getting any results when queries only contain "month-year" or "year" date values (I'm assuming we want to allow this). For example, this query should find at least the First Balkan war. This applies to "matching" relations too, not only ranges.

Maybe the exclusion has to do with attaching year precision to incomplete values formatted in xsd:dateTime? Maybe we should use YEAR(?time) with precision>=9 in case only years are entered, or translate the input to ranges of xsd:dateTime values instead

This is working after that last patch!

Change 678282 merged by jenkins-bot:

[wikidata/query-builder@master] Ensure day precision for day values

https://gerrit.wikimedia.org/r/678282

Change 678567 merged by jenkins-bot:

[wikidata/query-builder@master] Fix not selecting anything with month or year precision

https://gerrit.wikimedia.org/r/678567

amy_rc claimed this task.
amy_rc moved this task from Test (Verification) to Done on the Wikidata Query Builder board.

According to the UI, you can retrieve people born before (or after), for example, 31 January (birthday), but what you get has nothing to do with that. I believe there are quite a few other problematic use cases (which is understandable, this is a challenging task!).

31-1.png (314×1 px, 30 KB)

https://w-beta.wmflabs.org/Ho

https://w.wiki/3BVR

Hey, @abian! Thank you so much for reporting this problem. We broke it down in two issues:

  1. Querying for dates with year and month precision using the condition "before" generates invalid results. The following ticket was created to address this: T280035
  1. The system interpretation of dates that consist of 4 separate digits (month precision) is not clearly indicated to users: basically, the fact that 31-01 is interpreted as January of the year 31 is not apparent. Ticket: T280158

We are aware of some of the current limitations of the date queries in the Query builder, and we're currently collecting and documenting those. We would be very grateful if you'd be willing to share any other issues or problematic use cases that you might be aware of. Thanks very much again!