Page MenuHomePhabricator

Month and year are sometimes parsed as the first day of the month
Open, Needs TriagePublicBUG REPORT

Description

Steps to Reproduce

  1. Go to an item (e.g. sandbox) with Czech (cs) as the interface language.
  2. Add a new property with time datatype (e.g. P585).
  3. Write a month in Czech (see below) and a year (e.g. 2019).

Actual Results

  • For some months, the preview will show the first day of the month:
    • Namely "březen" (3), "květen" (5), "červen" (6), "červenec" (7), "září" (9), "říjen" (10).
  • The rest is parsed correctly: "leden" (1), "únor" (2), "duben" (4), "srpen" (8), "listopad" (11), "prosinec" (12).
    • Possible explanation: All problematic months contain characters with caron. It's possible that the parser (YearMonthTimeParser?) doesn't handle special characters correctly.
  • The same problem happens with all English months (still with the interface in Czech).



Expected Results

  • The value is always parsed to the "month" precision.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 17 2019, 12:30 PM
data-values\time\src\ValueParsers\YearMonthTimeParser.php
$logger = \MediaWiki\Logger\LoggerFactory::getInstance( 'MyCoolLoggingChannel' );
$logger->debug( 'stringParse: {value}', [ 'value' => $value ] );
// Matches year and month separated by a separator.
// \p{L} matches letters outside the ASCII range.
if ( !preg_match( '/^(-?[\d\p{L}]+)\s*?[\/\-\s.,]\s*(-?[\d\p{L}]+)$/', trim( $value ), $matches ) ) {
  throw new ParseException( 'Failed to parse year and month', $value, self::FORMAT_NAME );
}
$logger->debug( 'stringParse: ok' );

Works for some months but not for all. The \p{L} matches letters outside the ASCII range. trick apparently doesn't work.
@Lucas_Werkmeister_WMDE @Addshore @Ladsgroup Could you help?

Found https://stackoverflow.com/questions/26611495/regex-pl-problems
Solution:

if ( !preg_match( '/^(-?[\d\p{L}]+)\s*?[\/\-\s.,]\s*(-?[\d\p{L}]+)$/u', trim( $value ), $matches ) ) {
  throw new ParseException( 'Failed to parse year and month', $value, self::FORMAT_NAME );
}

(Add a u as modifier for the regex)

Bingo! I could really have tried Google. Weird that https://regex101.com/ (which I used) still matches without /u.

Patch-For-Review: https://github.com/wmde/Time/pull/146

I tried Regex101 too and after that I tried the same with a small php script which I ran on command line and that didn't work. That's when I started searching for what was wrong.

The patch doesn't fix this for dates BCE. When approved, I'll submit one more PR for this.

Change 620954 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@master] WIP DNM:All unicode in month names (using new data-values/time)

https://gerrit.wikimedia.org/r/620954

The patch doesn't fix this for dates BCE. When approved, I'll submit one more PR for this.

https://github.com/wmde/Time/pull/148

Change 621524 merged by jenkins-bot:
[mediawiki/vendor@master] Bump data-values/time to 1.0.2

https://gerrit.wikimedia.org/r/621524

Change 620954 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Test all unicode in month names (using new data-values/time)

https://gerrit.wikimedia.org/r/620954