Page MenuHomePhabricator

Add date filtering to RDF export
Closed, ResolvedPublic

Description

Our data contains dates like 1996-00-00T00:00:00Z or 1909-02-31T00:00:00Z which are not valid dates (the first has zero month, then second has non-existing date of February 31). Most generic tools (including Blazegraph) would not import such dates properly. We will need to pre-process date fields in order to ensure dates are valid.

Event Timeline

Smalyshev claimed this task.
Smalyshev raised the priority of this task from to Medium.
Smalyshev updated the task description. (Show Details)
Smalyshev subscribed.

For the first example of only a year (check precision?) see T92009 .

Another example Wikibase allows but XSD 1.1 does not is more than 60 seconds (I think Wikibase allows 62). XSD 1.1 seem to generally disallow leap seconds, see http://www.w3.org/TR/xmlschema11-2/#vp-dt-second .

Maybe we can use xsd support in libxml to validate the dates, but that might not be fast enough?

See also T92996. I think the solution for RDF would be to try and salvage dates that we can (like 1996-00-00T00:00:00Z), but just keep the ones that we can't make sense of as strings and hope the importing side will figure it out.

Yes. Keeping the ones that we can't make sense of as string maybe allows people to query Blazegraph for those?

A few additional things that we need to fix to be valid XSD 1.1:
Only have padding zeros in the year to 4 digits, not more. (Currently we have more.)
Remove the leading + if there is one.
Remove the Z at the end or put the valid time zone representation there.
Only seconds up to 59 are allowed.

The code is doing 4+ digits and + removal. Not sure about Z - this page http://books.xmlschemata.org/relaxng/ch19-77049.html suggests Z is valid. Didn't find any issues with times on current data but we may check that too.

I parsed the BNF incorrectly, the Z is valid.