Page MenuHomePhabricator

Field date does not handle ancient dates correctly
Open, Needs TriagePublic

Description

Ancient dates get not displayed correctly in date form field.

Example: 62/04/25 - this is indeed the 25th of April in the year 62. SMW can correctly handle it, but if I click on edit with form, it show the 1st of January 1970 (in PF verson 4.2) and nothing is shown in the date form (as if no date was there) in PF 4.8:

however, if the date ist just "5" (birth date in year 5, no other dates known), thatn it shows 5 correctly in the field year.

you can see it here:
https://testwiki.kdz.eu/index.php?title=Markus_(Evangelist)&action=formedit

Event Timeline

Krabina created this task.Mar 11 2020, 1:12 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 11 2020, 1:12 PM
Krabina renamed this task from Field date does not handly ancient dates correctly to Field date does not handle ancient dates correctly.Mar 11 2020, 1:13 PM

@Yaron_Koren While working on the issue I found the problem was in the statement
list( $year, $month, $day ) = self::parseDate( $date );
PHP parser functions take initial date as 1st January 1970. So I suggest instead of using parse date we should explode the date into individual day, month and year, something like this
list( $year, $month, $day ) = explode('/', $date);

parseDate() is a Page Forms function, not a PHP function.

Sahajsk added a comment.EditedMar 19 2020, 1:17 PM

@Yaron_Koren In the parseDate function we use a PHP function strtotime like:-
$seconds = strtotime( $date );
This function takes initial date as 1st January 1970, so returns null for dates earlier than this.
So instead of returning false when this function gives null, I suggest we should explode the date and return the corresponding values.

The problem is that the date will not always be of the form "YYYY/MM/DD" - it could take the form "Month DD, YYYY", or other formats. But I think you're right that strtotime() is causing the problem here - it works on dates older than 1970, but it does have a limit, at least when the operating system being used is 32-bit instead of 64-bit (if I understand things correctly).

Yes, it is true it works fine on a 64-bit system. Somehow if I input the date as 0062/04/21 instead of 62/04/21 it works fine. So I think it just the matter of what dates are actually treated as correct by the strtotime() function.

Is there another option besides strtotime()?

I have searched for alternatives, but there doesn't seem to be any. Instead, since we already have testing for several date formats like Month YYYY or YYYY, we can create another test for ancient dates.

Check out the DateTime class.

I came across another issue while working on the date input. When the input is set to something like{F31697409}(notice the year field with space), it is correctly displayed in the page{F31697411} but in "edit with form" it shows

.
So I suggest we should allow only specific inputs in the day and year fields, i.e. 1-31 in day field and 1-32767 in the year field.
This will resolve the issue with strtotime as well.

It's a good idea to require the year to be an integer. But how would that resolve the strtotime() issue?

If the input is restricted to a fixed number of digits(here 5) we can add zeros in front of smaller numbers such as 00062

Would that make any difference?

Yes strtotime('62/04/03') gives no result but strtotime('00062/04/03') returns 1044297360

Aha! That's interesting. It appears that there are two different issues here: poor handling of years with less than four digits, and problems with strtotime() handling on 32-bit operating systems. (There's also a third issue, that values in dates should be validated better, but that's not part of the problem here.) I'm not sure which of the first two issues is causing the original problem - it could be the 1st, the 2nd, or both. But if you just want to fix the first one, that would be great: I wouldn't be surprised if that would fix this problem.

And feel free to ignore the problem of validation, and just assume that the year, month and date values will all be integers. Validation is a separate problem.

If finally, strtotime($date) returns null we can check for another condition like:-
$date = DateTime::createFromFormat('Y/m/j', $date);
$d = $date->format('Y/m/d');
Now if $d is null then we return null (as everything is working now), and if it is true then it handles the case of years with less than four digits.
I have tried it, and it seems to work fine with all types of possible inputs of date.

Well, it's a partial solution, but it's better than nothing. Why "Y/m/j" in one and "Y/m/d" in the other?

Oh, it was a typo, works the same.
So should I make the changes?

Also, I have another solution. Since there seem to be only three date formats which are working we can use the above checking for each of these formats. These three formats are
Y: 2000
m Y: January 2002 (we can handle another language as well)
Y/m/d : 62/04/25
I might have missed some cases but I have tried all combinations in the date input.
For those cases as well we can apply the same condition.

Yes, feel free to make a patch already.

But, for the second case - how would you add leading zeroes there?

(I should note that probably no solution like this will be a complete solution, since Page Forms can edit a page that was not created with a form, so the date can be in any format, but if the code works 99.9% of the time, that's good.)

Sahajsk added a comment.EditedMar 23 2020, 1:40 AM

$date = DateTime::createFromFormat('F Y', $date);
$d = $date->format('F Y');
This will be able to handle this.(F for full month name)
I will still be looking for a universal solution.

Okay, great. It sounds like this enough to create a patch!

Change 582634 had a related patch set uploaded (by Sahajsk; owner: Sahajsk):
[mediawiki/extensions/PageForms@master] Make date field in PageForms handle ancient dates correctly

https://gerrit.wikimedia.org/r/582634

Change 582634 merged by jenkins-bot:
[mediawiki/extensions/PageForms@master] Make date field in PageForms handle ancient dates correctly

https://gerrit.wikimedia.org/r/582634

@Krabina - there was just a fix for this added (by @Sahajsk). Could you check to see if this fix works for you?

@Sahajsk - sorry, I meant to write you here before. I discovered the issue that dates in the form "February 2, 2010" (with the full month name spelled out) are not parsed correctly by the current code. And I wouldn't be surprised if there are other formats that are not being correctly handled. (Page Forms may need to read date formats that it itself did not create - something I didn't think about enough when reviewing your earlier code.) I think this patch needs to be modified to make use of strtotime() when possible - and maybe only use the DateTime class when it's clear that strtotime() has failed.

Sahajsk added a comment.EditedApr 8 2020, 2:07 PM

I have looked for the following ways to handle all types of date formats:-

  1. If the year is input in 4 or 5 digit format then strtotime can take care.
  2. If we take years 0-69 as 2000-2069 and 70-99 as 1970-1999 then DateTime class can take care of all types of formats. So to specify year such as 62 we need to type 0062.

According to me, the second option fits well as usually when years are written in two-digit formats those are considered as specified above.
This might not solve this specific issue as desired but it will be able to handle any type of date format.

Well, great - however you can get it working for all cases is good!

I was going through the solutions, but the best way is to ensure that the year input is in 4-digit format. In this way, the original code works fine.
Moreover, this will reduce confusion. For example,

February 62 implies both February of year 0062 and 1962. So the parsing cannot be done until the year is clearly specified.

I am stuck at this position, can you please help me out?

Sorry for the long delay. If "62" appears as the year, the code should interpret that as the year 62 whenever possible, not 1962 or 2062.

Sorry for so many doubts but I have not researched about dates in this depth before.
Just two more issues an I'll be ready for another patch.
How should these dates be interpreted?

  1. March 14( as 14 of March or March of the year 14 )
  2. 11/09/08 ( what order of year, month and day)

These are fair questions. I think "March 14" should be read as March of the year 14. As for "11/09/08" - Page Forms, by default, prints out dates as year/month/day, so I guess I'd have to go with September 8 of the year 11.

Honestly, though, I would rather that the Page Forms code interpret these as any date, rather than no date at all (or January 1, 1970). If the second date shows up as September 11, 2008, then (a) there's a chance that this is what the user intended, and (b) if not, they can probably pretty easily figure out what went wrong. On the other hand, if there's no date at all, then the user won't know what happened - and they might not even notice the error for a long time.

Change 587983 had a related patch set uploaded (by Sahajsk; owner: Sahajsk):
[mediawiki/extensions/PageForms@master] Make date field in PageForms handle ancient dates correctly

https://gerrit.wikimedia.org/r/587983

Change 587983 merged by jenkins-bot:
[mediawiki/extensions/PageForms@master] Make date field in PageForms handle ancient dates correctly

https://gerrit.wikimedia.org/r/587983