It would be amazing if all Data Lake tables used a consistent format for their time columns. The first step is picking a standard.
Datetime or timestamp?
EventBus, EventLogging, and call it a datetime (dt), but mediawiki_history calls it a timestamp (xxxxxx_timestamp). webrequest, for some reason, has both a ts and a dt field with different formats.
Which format?
There are actually five different formats in use!
format | tables using | comment |
---|---|---|
YYYY-mm-ddTHH:MM:SSZ | EventLogging tables ( e.g. event.editattemptstep and event.editorjourney), mediawiki_wikitext_history | Standard ISO 8601 with one form of timezone designation |
YYYY-mm-ddTHH:MM:SS+00:00 | EventBus tables (e.g. event.mediawiki_revision_create and event.mediawiki_page_create) | Standard ISO 8601 with another form of timezone designation |
YYYY-mm-dd HH:MM:SS.0 | mediawiki_history and siblings | Hive/JDBC standard, which makes it easy to use with Hive date and time functions— alternatives can be configured starting with Hive 1.2.0. Nicely human readable. |
unix epoch seconds | webrequest's ts field | This is a Hive timestamp type, which Hive will display as YYYY-mm-dd HH:MM:SS.0 when you select it. |
YYYY-mm-ddTHH:MM:SS | webrequest's dt field | Standard ISO 8601 without time zone info |
Unix timestamp | CirrusSearchRequestSet | |