Maniphest T198179

Month name and year preceded or followed by a dot or comma is parsed as having a day
Open, LowPublic
Actions

Assigned To

None

Authored By

	Nikki
	Jun 26 2018, 9:14 AM

Description

A similar issue to T151088, the following are all parsed as "1 April 1987":

~~"April 1987."~~
"April 1987,"
", April 1987"
". April 1987"

As are combinations of both, e.g.

~~", April 1987."~~

Or multiple characters, e.g.

~~"April 1987..."~~

The spacing around the characters does not make a difference.

For all of them, I would expect either "April 1987" or an error.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		None	T87764 Bugs related to time datatype (tracking)
		Open		None	T198179 Month name and year preceded or followed by a dot or comma is parsed as having a day

Event Timeline

Nikki created this task.Jun 26 2018, 9:14 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 26 2018, 9:14 AM

• Greta_Doci_WMDE added projects: MediaWiki-extensions-WikibaseRepository, DataValues.Jun 26 2018, 3:46 PM

• Greta_Doci_WMDE moved this task from incoming to needs discussion or investigation on the Wikidata board.

• Vvjjkkii renamed this task from Month name and year preceded or followed by a dot or comma is parsed as having a day to caaaaaaaaa.Jul 1 2018, 1:01 AM

• Vvjjkkii triaged this task as High priority.

• Vvjjkkii added projects: CheckUser, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), Tamil-Sites, Gamepress, Hashtags, Jade, KartoEditor, Language-2018-Apr-June, New-Editor-Experiences, Mail, TCB-Team (now WMDE-TechWish).

• Vvjjkkii updated the task description. (Show Details)

• Vvjjkkii removed a subscriber: Aklapper.

JJMC89 renamed this task from caaaaaaaaa to Month name and year preceded or followed by a dot or comma is parsed as having a day.Jul 1 2018, 2:44 AM

JJMC89 raised the priority of this task from High to Needs Triage.

JJMC89 removed projects: TCB-Team (now WMDE-TechWish), Mail, New-Editor-Experiences, Language-2018-Apr-June, KartoEditor, Jade, Hashtags, Gamepress, Tamil-Sites, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), CheckUser.

JJMC89 updated the task description. (Show Details)

JJMC89 added a subscriber: Aklapper.

thiemowmde added a parent task: T87764: Bugs related to time datatype (tracking).Sep 4 2018, 10:34 AM

This is an other situation where none of the (currently half a dozen) custom Wikibase parsers is able to understand an input string, and parsing falls back to PHP's problematic build-in parser (see http://php.net/manual/en/datetime.formats.php).

In my opinion the best option is to improve the existing YearMonthTimeParser. This parser is meant to understand dates with precision "month".

// Before:
'/^(-?[\d\p{L}]+)\s*?[\/\-\s.,]\s*(-?[\d\p{L}]+)$/'

// After:
'/^[\p{P}\p{Z}]*?(-?[\p{L}\p{N}]+)\p{Z}*?[\p{P}\p{Z}]\p{Z}*(-?[\p{L}\p{N}]+)[\p{P}\p{Z}]*$/'

// The same, just documented:
'/^
    [\p{P}\p{Z}]*?     # irrelevant punctuation/whitespace (ungreedy)
    (-?[\p{L}\p{N}]+)  # capture group 1 contains either month or year
    \p{Z}*?            # irrelevant whitespace (ungreedy)
    [\p{P}\p{Z}]       # at least 1 separator
    \p{Z}*             # irrelevant whitespace
    (-?[\p{L}\p{N}]+)  # capture group 2 contains either month or year
    [\p{P}\p{Z}]*      # irrelevant punctuation/whitespace
    $/x'

https://www.regular-expressions.info/unicode.html is a nice cheat sheet for these \p{…} Unicode character classes.

Properly testing this in YearMonthTimeParserTest is a must. Additionally, at least one relevant edge case should be added to TimeParserFactoryTest.

matej_suchanek updated the task description. (Show Details)Feb 7 2019, 5:45 PM

It also happens when preceded by a hyphen: - April 2000 and -April 2000 turn into 1 April 2000 BCE

Addshore unsubscribed.Jun 27 2023, 12:44 PM

They don't contain a dot or comma, but while testing something, I also found that "91-04 bc" and "0091-04 bc" turn into "1 April 91 BCE" (and for some reason "0091-04-00 bc" turns into "31 March 91 BCE").

Month name and year preceded or followed by a dot or comma is parsed as having a dayOpen, LowPublicActions

Description

Related ObjectsSearch...

Event Timeline

Month name and year preceded or followed by a dot or comma is parsed as having a day
Open, LowPublic
Actions

Related Objects
Search...