Page MenuHomePhabricator

Some date inputs without a day are parsed as having a day
Closed, ResolvedPublic

Description

I've seen a couple of cases recently where a date is parsed as having a day when it clearly doesn't:

"Sept 1966" is parsed as "1 September 1966" when I would expect "September 1966"

", 1966" is parsed as "18 November 1966" when I would expect either "1966" or an error.

Event Timeline

thiemowmde triaged this task as Lowest priority.Jun 8 2017, 3:54 PM
thiemowmde moved this task from incoming to needs discussion or investigation on the Wikidata board.

These are very welcome test cases. Thanks a lot for submitting this ticket!

The answer for both cases (and many more, e.g. the issue in T131625) is the PhpDateTimeParser based on PHPs build-in date-time parsing we are using as a fallback.

The second example ", 1966" is completed with the current month, day, and time, and results in "+1966-06-08T15:39:06Z". This is blocked by our validator because we don't accept date values with a time of day.

The first example is because "Sept" is not an abbreviation our parsers know about, but PHPs build-in parser obviously understands it. Unfortunately PHP doesn't tell us the precision.

There are already many heuristics in place in the PhpDateTimeParser, e.g. see https://github.com/DataValues/Time/blob/master/src/ValueParsers/PhpDateTimeParser.php#L98. Unfortunately I can not think of a simple fix for the "Sept" example. The solution I would like to see is us getting rid of this parser, and replace it with our own. This needs time.

thiemowmde claimed this task.

The main issue from this ticket is solved. The time component just needs a release. I will try to do this as soon as possible.