Page MenuHomePhabricator

The most commonly used date format in the Czech Republic produces wrong date when used as a value in Wikidata
Closed, ResolvedPublic5 Estimated Story PointsBUG REPORT

Assigned To
Authored By
Vojtech.dostal
Apr 16 2019, 2:24 PM
Referenced Files
F36859998: 1.1.2022.png
Feb 18 2023, 12:46 PM
F36563681: image.png
Jan 31 2023, 5:04 PM
F35831677: 01.01.2022.png
Dec 4 2022, 11:29 AM
F35831674: 1.1.2022.png
Dec 4 2022, 11:29 AM
F35831676: 01.01.1997.png
Dec 4 2022, 11:29 AM
F35831675: 1.1.1997.png
Dec 4 2022, 11:29 AM
F32386650: Bez názvu.png
Oct 16 2020, 4:55 PM
F28676848: Bez názvu.png
Apr 16 2019, 2:24 PM
Tokens
"Like" token, awarded by Mormegil.

Description

Steps to replicate the issue (include links if applicable):

  • Go to an item on test Wikidata
  • Switch the language to Czech (čeština)
  • Add a date to the statements
  • Enter the date 01.02.1997

What happens?:
An error pops up even though the date format is valid:

image.png (225×751 px, 37 KB)

What should have happened instead?:

  • There should be no error
  • The date should appear in the format of the country i.e. 1 February 1997

Other information:

Here is a visualisation of the issue from @matej_suchanek:

I have made a visualization of how each day is handled. It is interesting to see the patterns and also how they change depending on the input format and year:

yearD.M.YYYY0D.0M.YYYY
1997
1.1.1997.png (480×640 px, 22 KB)
01.01.1997.png (480×640 px, 23 KB)
2022
1.1.2022.png (480×640 px, 22 KB)
01.01.2022.png (480×640 px, 23 KB)

Original ticket

The most commonly used date format in the Czech Republic (and in many other countries, I suspect) is as follows:

DD. MM. YYYY

Or, optionally, without the spaces:
DD.MM.YYYY

However, inserting this date format swaps month and day, producing an easily overlooked error.

Bez názvu.png (199×839 px, 11 KB)

Event Timeline

Well, both MM. DD. YYYY and DD. MM. YYYY are used somewhere. Without making use of geolocalization data, it is impossible to tell which one user meant, except cases when DD>12.

Possible solutions:

  • decline this task
  • (no geodata) a preference saying "my dates are always in DD. MM. YYYY
  • geodata and priority of formats based on source country.

MDY is mentioned the article several times (Magenta row is the one you mentioned, but there's also Blue (171 millions) and red (329 millions)). That's about 500 millions of users in total. Yes, DMY is more frequent, but MDY is not that rare.

For example USA (Red) does not usually put dots between the codes. So it's MM/DD/YYYY rather than MM.DD.YYYY, which is really quite rarely used.

Fair enough, missed that "little detail" :-).

@Urbanecm: Why geolocalization when we have language settings (or at least an educated guess) per user? It's the same one that shows labels and descriptions in some languages and hides others.

There is also great inconsistency. Why does "7. 12. 1967" produce "1967-12-07" while "7. 1. 1967" produces "1967-07-01"?

There is also great inconsistency. Why does "7. 12. 1967" produce "1967-12-07" while "7. 1. 1967" produces "1967-07-01"?

And problem with certain dates: I wanted to add 1905-06-07. So I typed 6.7.1905 but date was displayed as july instead of june. So I switched nubers and typed 7.6.1905. The result was again as july.

Possible solutions:

  • decline this task

Wat.

  • (no geodata) a preference saying "my dates are always in DD. MM. YYYY

Right. Which we basically have; even though I would say you should not choose “my date format”, you should choose your language. Which you do, obviously. And it is completely obvious what date was meant by any Czech-speaking user who wrote “7. 1. 1967”.

  • geodata and priority of formats based on source country.

We don’t do i18n based on geolocation, I believe. And I see no reason why we should.

(See also sortable tables with dates in them.)

Change 620381 had a related patch set uploaded (by Matěj Suchánek; owner: Matěj Suchánek):
[mediawiki/extensions/Wikibase@master] Construct parsers with a copy of ParserOptions

https://gerrit.wikimedia.org/r/620381

Fascinatingly, 07.05.1997 now produces a completely different date...

Bez názvu.png (215×955 px, 11 KB)

Fascinatingly, 07.05.1997 now produces a completely different date...

Deserves a separate (sub)task. The result comes from ValueParsers\PhpDateTimeParser. It's interesting what happens when you remove the leading zero...

I have made a visualization of how each day is handled. It is interesting to see the patterns and also how they change depending on the input format and year:

yearD.M.YYYY0D.0M.YYYY
1997
1.1.1997.png (480×640 px, 22 KB)
01.01.1997.png (480×640 px, 23 KB)
2022
1.1.2022.png (480×640 px, 22 KB)
01.01.2022.png (480×640 px, 23 KB)

Task Triage Notes:

  • We should work on this in story time, in order to add more information to expand this into a bug report
Arian_Bozorg renamed this task from The most commonly used date format in the Czech Republic produces wrong date when used as a value in Wikidata to BUG REPORT: The most commonly used date format in the Czech Republic produces wrong date when used as a value in Wikidata.Jan 31 2023, 5:04 PM
Arian_Bozorg updated the task description. (Show Details)
Arian_Bozorg renamed this task from BUG REPORT: The most commonly used date format in the Czech Republic produces wrong date when used as a value in Wikidata to The most commonly used date format in the Czech Republic produces wrong date when used as a value in Wikidata.Feb 1 2023, 2:34 PM
Arian_Bozorg changed the subtype of this task from "Task" to "Bug Report".

Change 885851 had a related patch set uploaded (by Michael Große; author: Michael Große):

[mediawiki/extensions/Wikibase@master] DNM: tests for T221097

https://gerrit.wikimedia.org/r/885851

CCing @Mohammed_Sadat_WMDE @Arian_Bozorg to coordinate the announcement and deployment of this change; I started a draft message, feel free to take it over. (I’m assuming that we don’t need a feature-flag for this change – I think we can merge this on, say, a Tuesday, send out the announcement the same day, and then people have a bit over a week where the change is testable on Beta before it gets deployed with the train.)

Thank you so much for putting that together Lucas, that looks good from my end :)

I'll let Mohammed have the final word on this before we send it out.

Hey folks. I'm able to announce it this week on Wed or Thurs. @Arian_Bozorg Can you give me the heads-up when this is merged?

Change 620381 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Construct parsers with a copy of ParserOptions

https://gerrit.wikimedia.org/r/620381

I have made a visualization of how each day is handled.

I've rerun my visualization script. For every input, the result now looks like this:

1.1.2022.png (480×640 px, 22 KB)

Excellent news :)

Thanks so much Matej, Michael and Lucas for this

karapayneWMDE set the point value for this task to 5.Feb 21 2023, 9:32 AM

@matej_suchanek: In the bug triage hour today we were wondering if your script and visualization would also be doable for other languages. Is this something you can share or does it not make sense for other languages?

The script is here: P45844. But it can only test "all numeric" dates, such as "14. 3. 2023". So it makes sense for all languages that need to recognize inputs like these.