Page MenuHomePhabricator

Some date inputs without a day are parsed as having a day
Closed, ResolvedPublic

Description

I've seen a couple of cases recently where a date is parsed as having a day when it clearly doesn't:

"Sept 1966" is parsed as "1 September 1966" when I would expect "September 1966"

", 1966" is parsed as "18 November 1966" when I would expect either "1966" or an error.

Event Timeline

Nikki created this task.Nov 18 2016, 9:03 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 18 2016, 9:03 PM
thiemowmde triaged this task as Lowest priority.Jun 8 2017, 3:54 PM
thiemowmde moved this task from incoming to needs discussion or investigation on the Wikidata board.

These are very welcome test cases. Thanks a lot for submitting this ticket!

The answer for both cases (and many more, e.g. the issue in T131625) is the PhpDateTimeParser based on PHPs build-in date-time parsing we are using as a fallback.

The second example ", 1966" is completed with the current month, day, and time, and results in "+1966-06-08T15:39:06Z". This is blocked by our validator because we don't accept date values with a time of day.

The first example is because "Sept" is not an abbreviation our parsers know about, but PHPs build-in parser obviously understands it. Unfortunately PHP doesn't tell us the precision.

There are already many heuristics in place in the PhpDateTimeParser, e.g. see https://github.com/DataValues/Time/blob/master/src/ValueParsers/PhpDateTimeParser.php#L98. Unfortunately I can not think of a simple fix for the "Sept" example. The solution I would like to see is us getting rid of this parser, and replace it with our own. This needs time.

Restricted Application added a subscriber: PokestarFan. · View Herald TranscriptJul 24 2017, 7:10 AM
thiemowmde closed this task as Resolved.Sep 5 2017, 4:15 PM
thiemowmde claimed this task.

The main issue from this ticket is solved. The time component just needs a release. I will try to do this as soon as possible.