Page MenuHomePhabricator

Investigation: Determine approach for DST support
Closed, ResolvedPublic

Description

NOTE: This is a joint engineering ticket, which will be collectively owned and assigned to all 3 team engineers: @MHorsey-WMF @cmelo @Daimona

Acceptance Criteria:

  • Determine approach for handling DST for cases when the organizer enables registration, and therefore:
    • Picks a time for the event within a certain timezone
    • Time must be displayed correctly, taking into account DST, for the date of the event

Event Timeline

ifried renamed this task from Determine approach for DST support to Investigation: Determine approach for DST support.Aug 11 2022, 4:41 PM
vyuen updated the task description. (Show Details)
vyuen added subscribers: MHorsey-WMF, cmelo.

Context

Right now, when an organizers enters the date&time of an event, we immediately convert it to UTC and store the UTC value in the database. This is the standard way of storing dates in MW and many other places, but it does not work very well for dates in the future (and MW, AFAIK, doesn't do that in many places). The main problem of storing UTC without the timezone is that if the timezone rules change for a given region, the UTC timestamp will no longer correspond to the actual time of the event.

As an example, let's say that you want to organize an event in Berlin starting on December 15 2022, 14:30. On that day Berlin will be UTC+1 according to the current (August 25) schedule, so the UTC timestamp saved in the database will be 13:30. Now let's say that in a month, the german government decides that Germany won't switch back to UTC+1 this year, and will remain on UTC+2 instead. If you then try to convert the UTC timestamp to local time again it will add two hours, resulting in 15:30, which is not the intended start time.

You can find more detailed explanations in the "resources" section at the bottom, but the gist of it is that storing UTC is not sufficient. At the same time, we do need to store a UTC timestamp for sorting events by start date or putting them on a calendar.

One important thing is that the official source for the "time zone rules" is the IANA time zone database, also known as Olson database (enwp article). This is used by countless programming languages and applications that deal with dates, and is what we'd be using as well.

Scope

We want to support the following:

  • When an organizer specifies a (future) date for an event, we will make sure that any conversion of that date to UTC will respect the timezone rules valid on that future date
  • The local time of an event should be immutable and it should be the ultimate source of truth.
  • It should be possible to show events happening in multiple timezones on the same page, so we need a uniform value for sorting (UTC)
  • If a government changes time zone rules for a country or region with sufficient notice, our application should account for that and adjust the calculations accordingly.

There are also a couple situations that we chose not to support explicitly (i.e., support for these is best effort and not guaranteed):

  • Leap seconds, because we don't need high accuracy at the moment
  • Time zone rules changing with very short notice (e.g., a government deciding that a country will no longer observe DST the day before DST should have started)
  • Events starting at a time that either does not exist or is repeated twice, i.e., in events that happen exactly when DST begins or ends

Implementation

Note: we still need to hash out some details, but we agreed that the implementation will be roughly as explained below. Also, for simplicity I'm only talking about start dates, but everything below should also be done for end dates.

First, we will update our schema so that it stores the following information:

  • Local time of the event (e.g., 20221215143000)
  • Time zone for that local time (e.g., Europe/Berlin)
  • UTC time of the event (e.g., 20221215133000)
  • Version of the time zone db that was used to compute the UTC timestamp (e.g., 2022c)

When the organizer creates the event, they will enter the local time and the time zone. The extension will then compute the corresponding UTC timestamp and save everything into the database fields listed above.

When we need to display the event time to someone, we will just grab the local time from the DB, format it according to the user preferences and output it.

When we need to sort events by date, or put them in a calendar, we use the UTC timestamp to determine the ordering.

The above is not enough to handle time zone rules change. For that, we will create a maintenance script that does the following:

  • First, it makes sure that we're using the latest version of the time zone database, and takes note of the version number. The main detail to figure out is how this could happen. The two main ways of achieving it are: 1) make sure that PHP is internally using the latest version (preferred options; can be done with the timezonedb PHP extension or apparently the tzdata Debian package); 2) download the database from the IANA website and parse it.
  • It scans the database to see if any row was computed using an older version of the database. Note, this could be restricted to future events only, we still have to make a decision on that.
  • For every row that it finds, it recomputes the UTC timestamp according to the new rules, and puts it into the database, updating the tzdb version for that row as well.

Such a maintenance script would be run in production on a regular basis. Most of the times it won't have anything to do (updates to the timezone db only happen once in a while), but as long as we optimize the common case (i.e., bailing out sufficiently quickly when there's nothing to do), we could even run the script every hour or so (actual schedule TBD).

With the script running often enough, we can trust the UTC timestamps in the database to be accurate, and that should address the problem of time zone rules changing.

Resources

ldelench_wmf moved this task from Backlog to Darkship on the Campaign-Registration board.

Context

Right now, when an organizers enters the date&time of an event, we immediately convert it to UTC and store the UTC value in the database. This is the standard way of storing dates in MW and many other places, but it does not work very well for dates in the future (and MW, AFAIK, doesn't do that in many places). The main problem of storing UTC without the timezone is that if the timezone rules change for a given region, the UTC timestamp will no longer correspond to the actual time of the event.

As an example, let's say that you want to organize an event in Berlin starting on December 15 2022, 14:30. On that day Berlin will be UTC+1 according to the current (August 25) schedule, so the UTC timestamp saved in the database will be 13:30. Now let's say that in a month, the german government decides that Germany won't switch back to UTC+1 this year, and will remain on UTC+2 instead. If you then try to convert the UTC timestamp to local time again it will add two hours, resulting in 15:30, which is not the intended start time.

You can find more detailed explanations in the "resources" section at the bottom, but the gist of it is that storing UTC is not sufficient. At the same time, we do need to store a UTC timestamp for sorting events by start date or putting them on a calendar.

One important thing is that the official source for the "time zone rules" is the IANA time zone database, also known as Olson database (enwp article). This is used by countless programming languages and applications that deal with dates, and is what we'd be using as well.

Scope

We want to support the following:

  • When an organizer specifies a (future) date for an event, we will make sure that any conversion of that date to UTC will respect the timezone rules valid on that future date
  • The local time of an event should be immutable and it should be the ultimate source of truth.
  • It should be possible to show events happening in multiple timezones on the same page, so we need a uniform value for sorting (UTC)
  • If a government changes time zone rules for a country or region with sufficient notice, our application should account for that and adjust the calculations accordingly.

There are also a couple situations that we chose not to support explicitly (i.e., support for these is best effort and not guaranteed):

  • Leap seconds, because we don't need high accuracy at the moment
  • Time zone rules changing with very short notice (e.g., a government deciding that a country will no longer observe DST the day before DST should have started)
  • Events starting at a time that either does not exist or is repeated twice, i.e., in events that happen exactly when DST begins or ends

Implementation

Note: we still need to hash out some details, but we agreed that the implementation will be roughly as explained below. Also, for simplicity I'm only talking about start dates, but everything below should also be done for end dates.

First, we will update our schema so that it stores the following information:

  • Local time of the event (e.g., 20221215143000)
  • Time zone for that local time (e.g., Europe/Berlin)
  • UTC time of the event (e.g., 20221215133000)
  • Version of the time zone db that was used to compute the UTC timestamp (e.g., 2022c)

When the organizer creates the event, they will enter the local time and the time zone. The extension will then compute the corresponding UTC timestamp and save everything into the database fields listed above.

When we need to display the event time to someone, we will just grab the local time from the DB, format it according to the user preferences and output it.

When we need to sort events by date, or put them in a calendar, we use the UTC timestamp to determine the ordering.

The above is not enough to handle time zone rules change. For that, we will create a maintenance script that does the following:

  • First, it makes sure that we're using the latest version of the time zone database, and takes note of the version number. The main detail to figure out is how this could happen. The two main ways of achieving it are: 1) make sure that PHP is internally using the latest version (preferred options; can be done with the timezonedb PHP extension or apparently the tzdata Debian package); 2) download the database from the IANA website and parse it.
  • It scans the database to see if any row was computed using an older version of the database. Note, this could be restricted to future events only, we still have to make a decision on that.
  • For every row that it finds, it recomputes the UTC timestamp according to the new rules, and puts it into the database, updating the tzdb version for that row as well.

Such a maintenance script would be run in production on a regular basis. Most of the times it won't have anything to do (updates to the timezone db only happen once in a while), but as long as we optimize the common case (i.e., bailing out sufficiently quickly when there's nothing to do), we could even run the script every hour or so (actual schedule TBD).

With the script running often enough, we can trust the UTC timestamps in the database to be accurate, and that should address the problem of time zone rules changing.

Resources

Thanks @Daimona , I totally agree!

So I think the next steps after this investigation are, create the next two tasks below:

1 - Add the new columns to save the local time, timezone of the local time, and its UTC conversion, change the code as needed to display the right information for the user.

2 - Create the JOB to run every X minutes and check if there is a change on our timezone db (tzdata Debian package), and if there is, update the events as needed.

The details of each of the 2 tasks above will be described in the tasks, @ifried and @vyuen, I think we can close this tasks, thank you!!!

I am not sure you actually need to store the version of the tzinfo db in your database, reading the above, but please correct me if I'm wrong.

Let's say you just store the following data:

  • Timestamp in UTC
  • Timestamp in the local TZ
  • the local TZ

Once there is an update to tzdata and it's rolled out[1] we run a maintenance script that, for each record:

  • checks if the time in the UTC timestamp calculated subtracting the UTC offset from the local timestamp is equal to what is stored in the database.
  • If not, update the UTC timestamp

[1] We can also run this script weekly, and just run it one-off if we know there's a big TZ change incoming

The only difference is that with this version of the schema, the script is slightly less efficient, but in practice I am going to assume that when a new version of tzdata is published, all records will need to be checked for an update anyways.

@Daimona am I missing something?

I am not sure you actually need to store the version of the tzinfo db in your database, reading the above, but please correct me if I'm wrong. [...] Once there is an update to tzdata and it's rolled out[1] we run a maintenance script [...]

Yes, this is correct. In fact, the main reason for having the timezone db version in the schema is efficiency, as you noted:

The only difference is that with this version of the schema, the script is slightly less efficient

Yes, we could do that if we change the logic of the script and the cron. I just wasn't sure if it'd be a good idea to always go through each record, parse the timestamp and possibly fix it.

but in practice I am going to assume that when a new version of tzdata is published, all records will need to be checked for an update anyways.

This may or may not be the case depending on what we want to do with events in the past (whose UTC timestamp should be considered final). We still haven't made a decision about it.

We decided that we will do this without storing the timezone db version, as described above. Additionally, we will add a parameter to the script that allows filtering the records by timezone, so that when the rules change and a human runs the script, it would be possible to only process records with the specified timezones, for efficiency.

vyuen changed the task status from Open to In Progress.Aug 30 2022, 4:27 PM

I was thinking a lot about this ever since we talked about this problem, and so I would like to offer a slight point here about being careful of over-optimizing for this problem.

Users/People's expectations in terms of event handling has been radically changing in the past years, especially after COVID, and even more given the nature of both in-person AND hybrid events. I can see cases where event organizers may not *want* an event time to change on them even if the DST decision in their area has changed.

I have examples from international events (are you sure they want those submission deadlines to move around?) and examples from local events where people may come from a couple hours away but across timezone/DST lines (London to Paris, for example). Are we sure we know what people expect to happen if DST rules change in these cases, and that what they expect, always, is that the time adjusts locally? Are we sure the organizers ALWAYS want their time to change based on local?

Implementing code that fairly constantly checks multiple formats of timezones (local vs UTC) is not trivial.

You're getting yourselves into a state where you will need to maintain this state AND make sure it's constantly updated and considered in your code.

On top of that, this is not a concept that MediaWiki is aware of for future OR past, which means that you may also run into mismatches when you look at diff-history for events. If you pull histories of what participants have edited in an event a month ago, but the UTC was adjusted to recognize a new timezone, then you might have mismatching results FROM mediawiki since mediawiki is NOT adjusting for this. You may represent the wrong diffs in that case, or the wrong submissions, since mw stores the UTC and never adjusts it. Watch out for that.

So, since implementing this comes with a bit of an overhead here, I think there are two things you should take into account:

  • Is this a big enough actual problem for you to add the overhead that this will give you? If so, that's fair.
    • But perhaps your MVP can be a smaller test that explores (a) how big of a problem this is, how often it happens, what the outliers are to adjust for, and (b) what your users (organizers and participants) expect to happen.
  • If you do add the functionality, I'd offer that you might want to make sure users want this for specific events.
    • No matter what you do (implement this or not implement this) there will probably need to be quite a bit of user-education / explanation in the product to explain what is happening. Either you'll need to explain that the organizer should change times if DST change happened, OR you will need to explain that DST changes happen automatically in case the specific event is international. Timezones are annoying, my heart is with you, seriously.
    • That said -- if you ARE going to implement this adjustment, then I'd make sure organizers understand that if they want their time to NOT change based on DST changes (for example "deadline for submissions to Wiki Loves Monument" or something) that they must choose UTC in the event timezone. Make sure people understand the implication of choosing the timezone of the event (not just "the timezone they see when they look" which is what the internet USUALLY does)

I hope my post was not too long here. I think you are all dealing with a really messy problem and taking it in with incredible consideration and careful approaches. I don't know if there's a real good single answer here, but I really would not want y'all to add a fairly large amount of overhead and still run into potentially worse problems.

That said -- you all know this product AND the problem of DST and timezones much better than I do, given you've explored this so carefully.

I was thinking a lot about this ever since we talked about this problem, and so I would like to offer a slight point here about being careful of over-optimizing for this problem.

Thank you, I appreciate that :)

Are we sure the organizers ALWAYS want their time to change based on local?

This is a good question. Product-wise, I don't have an answer and will defer to @ifried. However, I don't think we have to build it in such a way that the time always change if there's a change to the tz rules. The proposed implementation is that organizers could specify a geographical timezone (e.g., "Europe/Rome"), OR a fixed offset from UTC (e.g., "+02:00"), just like it is in Special:Preferences. The assumption is that the latter would be immutable, and immune to any timezone rule changes. Perhaps this behaviour could be made explicit by adding a tooltip to the timezone field, saying something like

If you select a geographic zone, the local time of the event will never change. ...

Again, I'm not sure if this is a good idea and will defer that decision to Ilana. Also, this should be carefully worded so that it's easy to understand for everyone -- time is already a complex matter, we should try not to make it more difficult.

Implementing code that fairly constantly checks multiple formats of timezones (local vs UTC) is not trivial.

I'm not sure if I got this, would you mind elaborating?

On top of that, this is not a concept that MediaWiki is aware of for future OR past, which means that you may also run into mismatches when you look at diff-history for events. If you pull histories of what participants have edited in an event a month ago, but the UTC was adjusted to recognize a new timezone, then you might have mismatching results FROM mediawiki since mediawiki is NOT adjusting for this. You may represent the wrong diffs in that case, or the wrong submissions, since mw stores the UTC and never adjusts it. Watch out for that.

I'm not sure if this would be a problem: once a date has passed, its UTC representation can no longer change (this is also why using UTC is fine for past dates), so we wouldn't be changing it any more. Since MediaWiki also uses the latest version of the timezone db (as SREs told us), I would expect the UTC representations of dates in the past to always be the same in MW and in the events table.

So, since implementing this comes with a bit of an overhead here, I think there are two things you should take into account:

  • Is this a big enough actual problem for you to add the overhead that this will give you? If so, that's fair.

I don't think we have specific data for events, but I'm assuming that if we don't fix this, someone will run into this issue sooner or later. I think there are two additional factors to consider, aside from the frequency. First, if a mismatch happens for an in-person event, it may have serious consequences for the people involved: someone may show up too early or too late for an event, which could have physical repercussions; it's not quite the same as (say) a watchlist entry expiring an hour too early/too late. Second, time zone rule changes are completely independent from users, i.e. they are not a direct consequence of user actions. The only factor which determines whether a user is ever going to experience this problem is where they live. Thus, I'm a bit worried that not addressing this issue could be interpreted as not wanting to support a worldwide community, which would be quite against our values.

  • If you do add the functionality, I'd offer that you might want to make sure users want this for specific events.
    • No matter what you do (implement this or not implement this) there will probably need to be quite a bit of user-education / explanation in the product to explain what is happening. Either you'll need to explain that the organizer should change times if DST change happened, OR you will need to explain that DST changes happen automatically in case the specific event is international. Timezones are annoying, my heart is with you, seriously.
    • That said -- if you ARE going to implement this adjustment, then I'd make sure organizers understand that if they want their time to NOT change based on DST changes (for example "deadline for submissions to Wiki Loves Monument" or something) that they must choose UTC in the event timezone. Make sure people understand the implication of choosing the timezone of the event (not just "the timezone they see when they look" which is what the internet USUALLY does)

Right, I entirely agree with this, and this seems to be in line with my proposal above to add a tooltip next to the timezone field. I keep thinking it would be a good idea. Technically it would be an easy change, the actual challenge is wording it in a way that everyone can understand, regardless of their knowledge of timezones. @ifried Maybe we could consider this for V1?

Closing this as the work is complete. However, please feel free to continue the discussion here if needed.