Page MenuHomePhabricator

Use existing $dateFormats to format dates on Wikidata
Open, NormalPublic

Subscribers
Tokens
"Like" token, awarded by Shizhao."Like" token, awarded by Capankajsmilyo."Like" token, awarded by Liuxinyu970226."The World Burns" token, awarded by revi."Like" token, awarded by deryckchan.
Assigned To
None
Authored By
He7d3r, Feb 26 2014

Description

Since Wikidata's early days it had been possible to use Wikidata's interface in languages other than English. However, dates have so far been half-localized by substituting the month name with the month name in the target language without localizing the date format string.

This results in major inconvenience to users of languages where the date format string is not "d Mmm yyyy" or "Mmm d yyyy". In many cases the partially localized dates make no sense to a native reader.

This task requests that language-specific format strings to be applied when Wikidata displays any date. Until that is implemented, incomplete localizations should be reverted to an international date format (e.g. 2012-10-29) for languages that do not use a "d Mmm yyyy" format string,


For example, per this thread, the Portuguese interface displays "junho 12 1990" for this item which is wrong Portuguese; the correct date format should be "12 de junho de 1990".

In other languages, such as Chinese (all variants), Japanese, and Korean where month names are simply numbers, such a localisation results in a mangled string of numbers which make little sense to the reader, for example "22 五月 2017" which does not make sense to a Chinese reader; the correct format string should be "2017年5月22日".

Also many more date formats per languages should be changed.
See https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2015/01#Date_format


Whiteboard: papercut u=dev c=backend p=3

Details

Reference
bz61958

Related Objects

StatusAssignedTask
ResolvedNone
DuplicateNone
OpenNone
OpenNone
OpenNone
Declined Reguyla
ResolvedNemo_bis
OpenNone
OpenNone
OpenRical
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedLydia_Pintscher
ResolvedNone
Resolvedadrianheine
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
Resolvedadrianheine
DeclinedNone
DeclinedNone
OpenNone
ResolvedNone

Event Timeline

bzimport raised the priority of this task from to Normal.Nov 22 2014, 2:57 AM
bzimport set Reference to bz61958.
bzimport added a subscriber: Unknown Object (MLST).
He7d3r created this task.Feb 26 2014, 8:36 PM

(In reply to Helder from comment #0)

Per
https://www.wikidata.org/w/index.
php?diff=112407453&oldid=112404791#Date_formatting
I would like to format the date "junho 12 1990" on
https://www.wikidata.org/wiki/Q159?uselang=pt-br
as "12 de junho de 1990".

Would adding "de" work for all days of all months?

Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Lydia_Pintscher removed a subscriber: Unknown Object (MLST).Dec 1 2014, 2:53 PM
He7d3r updated the task description. (Show Details)Dec 1 2014, 2:57 PM
He7d3r added a project: I18n.
He7d3r set Security to None.
Stryn renamed this task from Change formatting of dates in Portuguese on Wikidata to Change formatting of dates on Wikidata.Jan 28 2015, 7:02 PM
Stryn updated the task description. (Show Details)
He7d3r renamed this task from Change formatting of dates on Wikidata to Use existing $dateFormats to format dates on Wikidata.Jan 29 2015, 5:47 PM
Nikki updated the task description. (Show Details)Jun 25 2015, 7:21 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 24 2015, 1:41 PM
RP88 added a subscriber: RP88.Aug 26 2015, 7:17 PM

As with many of these time related tickets, the MwDateFormatParser will solve a lot of these cases, see https://gerrit.wikimedia.org/r/153211 and https://github.com/DataValues/Time/pull/83, both still work in progress.

Change 153211 had a related patch set uploaded (by Thiemo Mättig (WMDE)):
[WIP] Add MwDateFormatParser

https://gerrit.wikimedia.org/r/153211

Rical added a subscriber: Rical.Oct 18 2015, 9:21 AM
Nikki added a subscriber: Nikki.Apr 12 2016, 5:17 PM
hoo assigned this task to thiemowmde.Sep 18 2016, 2:29 PM
Samat added a subscriber: Samat.May 20 2017, 4:04 PM
Samat added a comment.May 20 2017, 4:06 PM

Also the Hungarian date format (and name of the months in Hungarian) should be implemented. I think, the proposed patch covers this case, too. Is it correct?

@Samat, I'm sorry, can you please describe in more detail what you mean? What is the current situation, what is not correct, and how should it be instead?

When I go to a page like https://www.wikidata.org/wiki/Q159?uselang=hu I see "12 junho 1990". This is, as far as I can tell, the "name of the month in Hungarian".

When I look at https://phabricator.wikimedia.org/source/mediawiki/browse/master/languages/messages/MessagesHu.php;9e8355c87d35345bab5de10cab6c42832f33917d$145 I see that the Hungarian language is set to use a YMD-ordered date format by default. Wikibase currently does not use this, but the "dmy" format. Unfortunately the Hungarian language definitions do not specify a dmy format. That's why the dot is missing in "12 junho 1990".

The patch I mentioned above is not about formatting but about parsing. Being able to parse all date formats is a prerequisite to change the formatting.

Samat added a comment.EditedMay 21 2017, 9:47 AM

@thiemowmde, thank you for your answer. I thought that Wikibase use the month names in English right now, but I was not correct: the names itself are good (for example "12 június 1990").

But the date formatting is incorrect. As you mentioned, the order should be YMD, and after the year and after the day numbers there should be a point. (For the example above "1990. június 12.")

If you say, this ticket won't solve the formatting issue, I open a separate ticket for that. (Or if you know a ticket already open to handle the same problem, please point at it.)

Oh, please do not create more tickets. This one here is about the exact formatting issue you asked for. It's just that the patch I linked above does not fully solve the issue.

ok, thanks :)

deryckchan added a comment.EditedMay 22 2017, 9:12 AM

To clarify, this ticket is about displaying and outputting dates in the language-appropriate format.

At the moment Wikidata's web interface and {{#Property:}} returns "dd mmm yyyy" or "mmm dd yyyy" with mmm substituted for the name of the month in the desired language, which as earlier discussion has shown is not useful at all to languages whose date formatting isn't a direct application of one of these two formats.

@thiemowmde - Would you please explain how a patch about parsing dates is a prerequisite of a solution about displaying dates?

Central modules will need some date formats inside one module:
for content, page and user languages, see T135845
and to display categories in user language and link them in wiki language, see T68051.

Here, to easy permit these needs, I suggest to structure the change code with that in mind.

@deryckchan, simply because the software must understand itself. The formatted date is what appears in the edit field. We do not want to show the unformated YYYY-MM-DD there as this would be even more confusing, so we show it formatted. You want to edit this, and expect the software to accept the format it was outputting before.

With no parser that is able to understand all formats (and not confuse them!) we can't output all formats.

@Rical, you are right, this is closely connected. But for now this ticket is about PHP backend rendering only, not about possible future Lua modules.

deryckchan added a comment.EditedMay 22 2017, 4:39 PM

@deryckchan, The formatted date is what appears in the edit field. We do not want to show the unformated YYYY-MM-DD there as this would be even more confusing, so we show it formatted.

I must argue that it is actually more confusing to show partially localised "dd Mmm yyyy". Most users of the internet understand yyyy-mm-dd regardless of mother language and Wikidata users in particular are used to language fallback chains.

@thiemowmde : Imagine your software displays "2017Jahr5Monat22Tag" (which is the Chinese format string with German words substituted in). This is how users of non-"dd Mmm yyyy" languages currently feel when we use Wikidata. It's worse than defaulting to "2017-05-22" or even "May 22 2017".

May I suggest that we actually display "yyyy-mm-dd" until language-specific date formats are implemented?

Samat added a comment.May 22 2017, 4:53 PM

I must argue that it is actually more confusing to show partially localised "dd Mmm yyyy". Most users of the internet understand yyyy-mm-dd regardless of mother language and Wikidata users in particular are used to language fallback chains.
@thiemowmde : Imagine your software displays "2017Jahr5Monat22Tag" (which is the Chinese format string with German words substituted in). This is how users of non-"dd Mmm yyyy" languages currently feel when we use Wikidata. It's worse than defaulting to "2017-05-22" or even "May 22 2017".
May I suggest that we actually display "yyyy-mm-dd" until language-specific date formats are implemented?

I agree and I would suggest the same if the implementation needs longer time.

There is no reason to do the actual opposite of what this ticket asks for. No matter what the users language is, everybody can distinguish day, month and year in "10 November 2017". But we can not assume everybody understands what the month in "2017-11-10" is. This is actually the 11th of October in certain regions of the world.

There is no reason to do the actual opposite of what this ticket asks for. No matter what the users language is, everybody can distinguish day, month and year in "10 November 2017". But we can not assume everybody understands what the month in "2017-11-10" is. This is actually the 11th of October in certain regions of the world.

The reason is that, I'm afraid, it is not correct to assume that "everybody can distinguish day, month and year in [dd Mmm yyyy]". As Samat and I have strongly argued in this thread, translated month names + wrong date formatting string is not comprehensible in many languages. It is better to default to a correct foreign language than to use an incomprehensibly wrong attempt to localise.

KTC added a subscriber: KTC.May 23 2017, 5:41 PM

@thiemowmde You can accept the idea that "yyyy-mm-dd" may not make sense for some people in the world, but not when a native language reader telling you that "dd Mmm yyyy" makes even less sense? When doing localisation, if native readers are telling you that what you have make no sense in that language, stop and listen.

Either do full localisation of a string, or don't do it at all.

ISO 8601 format is a well understood international standard designed "to provide an unambiguous and well-defined method of representing dates and times, so as to avoid misinterpretation of numeric representations of dates and times, particularly when data are transferred between countries with different conventions for writing numeric dates and times" (from English Wikipedia). Why on earth would you invent a partially localised system that make no sense at all in many languages?

Nikki added a comment.May 23 2017, 6:44 PM

@thiemowmde : Imagine your software displays "2017Jahr5Monat22Tag" (which is the Chinese format string with German words substituted in). This is how users of non-"dd Mmm yyyy" languages currently feel when we use Wikidata. It's worse than defaulting to "2017-05-22" or even "May 22 2017".

I'm not a native or even fluent speaker of Chinese (or Japanese or Korean), so maybe you would disagree, but I think a better analogy is: Imagine being presented with "30 10 minutes 3" as a length of time in English.

English speakers might eventually figure out that it's supposed to mean "3 hours 10 minutes 30 seconds" but the parts are in the wrong order and two of the expected words are missing, which results in something that looks like complete nonsense. Writing "2017Jahr5Monat22Tag" in German is definitely weird, but I don't think it has the same effect on the comprehensibility.

Nikki's analogy is spot on!

In general, it is a bad idea to assume that users will be able to understand something non-obvious, and this is an even worse idea when it comes to multiple languages. As a native English speaker, with a moderate level of Mandarin, and as a software developer and computational linguist, when I first looked at "22 五月 2017", my first thought was that there was some mistake, because it just looks garbled. For instance, was "22五" supposed to be one number? Of course it's *possible* to work it out, but that's beside the point. This is a failed attempt at localisation, needs to be fixed, and cannot be written off as something that's incorrect but understandable.

This ticket asks for full localization as supported by MediaWiki core. I, personally, love to work on date parsing and formatting and already spend weeks (!) working on code required to fully solve this ticket some day. I will not throw everything away we did in the past four (!) years just because some people start yelling at me with no scientific arguments given.

I see the possibility for a few smaller improvements we could make:

  • 12 of the 423 languages and language-variants MediaWiki currently supports name their default date format "ymd". These languages are namely crh (including variants), hu, kaa, and kk (including variants). We could disable the localization for these languages and display raw ISO dates instead. Users will not get better localization by doing so. But the order will be the same as the users expect. We might assume users being used to any kind of "ymd" ordering are less confused by "2017-11-10", even if it will be entirely unlocalized until we support full localization.
  • About 20 more languages specify default date formats that start with the year, but are not named "ymd". Most notably gan, ko, and zh, including all their variants. We might add these to a blacklist and display raw ISO dates as well.
  • We might add other languages to the same blacklist if requested and ISO is proven to be less confusing for native speakers.

I will not discuss globally disabling the mostly working localizations for the 368 languages (87%) that name their default date format "dmy".

  • 12 of the 423 languages and language-variants MediaWiki currently supports name their default date format "ymd". These languages are namely crh (including variants), hu, kaa, and kk (including variants). We could disable the localization for these languages and display raw ISO dates instead. Users will not get better localization by doing so. But the order will be the same as the users expect. We might assume users being used to any kind of "ymd" ordering are less confused by "2017-11-10", even if it will be entirely unlocalized until we support full localization.
  • About 20 more languages specify default date formats that start with the year, but are not named "ymd". Most notably gan, ko, and zh, including all their variants. We might add these to a blacklist and display raw ISO dates as well.
  • We might add other languages to the same blacklist if requested and ISO is proven to be less confusing for native speakers.

This is a good plan. Thank you for your hard work on date formatting for Wikidata!

This sounds reasonable. Using the raw ISO format "yyyy-mm-dd" would be a better localisation for Mandarin than using "dd mm月 yyyy". I cannot be as certain for the other ymd languages, without a closer look at the linguistic data. I'm not sure what kind of scientific argument is being asked for, but on the subject of not assuming that users can work things out, I would recommend the following paper:

http://www.oecd-ilibrary.org/education/skills-matter_9789264258051-en

It's long but eye-opening. A summary is also available here:

https://www.nngroup.com/articles/computer-skill-levels/

Relevant Patch-For-Review that adds a simple TimeFormatter that can output ISO-like YMD-ordered dates in all relevant precisions: https://github.com/DataValues/Time/pull/49. We might use this basic YMD-formatter instead of the current (DMY-) MwTimeIsoFormatter for the non-DMY languages listed above.

deryckchan updated the task description. (Show Details)Jun 8 2017, 5:12 PM
Restricted Application added a subscriber: revi. · View Herald TranscriptJun 8 2017, 5:12 PM
Restricted Application added a subscriber: PokestarFan. · View Herald TranscriptJul 26 2017, 11:49 PM

Scibunto modules need also to extract any part of the date (and/or time).
This is very difficult if the only available date is language-formated.
Then I suggest to give also to modules the ISO 8601 format.

revi awarded a token.Aug 24 2017, 9:37 PM

I noticed that the date format has recently changed from "25 九月 1997" to "25 9 1997". Is work being done on the date formatting? And are we close to getting ISO dates or language-formatted dates?

Restricted Application added a subscriber: jeblad. · View Herald TranscriptAug 25 2017, 1:10 PM

I am wondering what is the state of this task. Is there any progress?
I checked the date format in case of Hungarian language, and I saw, that there is a small change since May: There is a dot after the day, for example "12. november 1918"

This is really quite close to the correct format:

  • we would need one more dot after the year number,
  • should be ordered as YYYY.MM.DD.

I am not a programmer but I don't see why would this change be so complicated.
Can we expect that this change will happen (soon)? :)

Rical removed a subscriber: Rical.Mar 20 2018, 10:35 AM

Is this task complete? Can someone please update this 3 year old task's status?

Is this task complete? Can someone please update this 3 year old task's status?

No change since August 2017. Date format displays in some languages (e.g. Hungarian and Cantonese above) have changed but are still wrong. It appears that the underlying software remains unable to handle date formats that don't follow d-m-y word order.

Relevant Patch-For-Review that adds a simple TimeFormatter that can output ISO-like YMD-ordered dates in all relevant precisions: https://github.com/DataValues/Time/pull/49. We might use this basic YMD-formatter instead of the current (DMY-) MwTimeIsoFormatter for the non-DMY languages listed above.

Why has this patch for review still not accepted? Where is the pending discussion? How can we move forward on this one?

If there's no way to fix the internationalized format now then please change the format into ISO date format as a temporary fix. There's currently no way for me to tell which day a date value actually represent without trying to edit it and see the calendar pop up.

If there's no way to fix the internationalized format now then please change the format into ISO date format as a temporary fix. There's currently no way for me to tell which day a date value actually represent without trying to edit it and see the calendar pop up.

Agreed - we've been sitting here for a year and date fields remain unusable in non-dmy languages. If we switch back to ISO dates until language-specific date formatting strings can be rolled out, at least people can use it without confusion.

Jarekt added a subscriber: Jarekt.Nov 17 2018, 5:30 AM

We have solved that issue on Commons a decade ago by writing templates which are now in form of Module:Date and Module:ISOdate. Both modules are both on Commons and Wikidata. Maybe we can just pipe the date through that module. Or capture the logic of the module in Mediawiki code.

Change 153211 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add DateFormatParser and MwDateFormatParserFactory

https://gerrit.wikimedia.org/r/153211

thiemowmde removed thiemowmde as the assignee of this task.Tue, Oct 1, 7:56 AM