Page MenuHomePhabricator

[Task] Validate timezones and block all timezones other than 0
Open, MediumPublic

Description

Currently our set of validators in ValidatorBuilders does not validate the timezone. This is done in TimeValue, which limits it to an integer from -12 * 3600 to 14 * 3600.

I suggest to block all timezones other than 0, because:

  • Almost all timezones we currently have in the database are errors/mistakes. Typically done by bots when creating an "imported from" reference. This is then the timezone where the bot is running, or anything else. We can't know.
  • Having a timezone barely* makes sense when we do not support hour, minute and second. *You could argue that the timezone can be relevant even with day precision, but that's really vague and underspecified. Example: An event is know to happen on April 1st at 23:00 o'clock, but this is in a non-UTC timezone. It's April 2nd in UTC. People enter April 1st, which only makes sense with the timezone and the time. And that's the point: It does not make much sense without the time.

Query for http://query.wikidata.org:

# Dates with timezone are most probably a mistake
SELECT ?statementId ?c ?tz
WHERE {
    # Keys: wikibase:timeValue (xsd:dateTime), wikibase:timePrecision (xsd:integer), wikibase:timeTimezone (xsd:integer), wikibase:timeCalendarModel
    ?dataValueId wikibase:timeTimezone ?tz .
    FILTER (?tz != 0) .
    ?refId ?c ?dataValueId .
    ?statementId ?b ?refId
    #SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
LIMIT 10

Event Timeline

FYI: there are a few timezones which are at n + 0.5.

That said, I'm pretty sure there's a task lying around to add timezone support in the UI...

Sure, both time (as in hour, minute and second) as well as timezone support is a to do, see T57755: Allow time values more precise than day on Wikidata.

There are all kinds of crazy non-hour timezones. Our internal format uses minutes.

I argue that entering the timezone when no time for the event is provided in the source is valid. For example the White House web site tells us Barack Obama was born August 4, 1961 in Hawaii. We know that Hawaii standard time is 10 hours behind UT, and Hawaii does not observe daylight time. We can safely conclude Obama was born between 04:00 August 3 and 14:00 August 4, UT. Providing the ability to input a correct time zone allows the information from most statements in sources about births and deaths to be faithfully reproduced.

That would be fine if the UI would allow entering timezones, but it doesn't. The majority of the timezones currently in the database, if not all, are errors.