In T240460#6614767 we decided the following:
- dt is always a client AKA event timestamp.
- meta.dt is always a server side receive timestamp.
To accomplish this:
- All schemas should be updated to have both a meta.dt and a dt field. dt should be required.
- EventBus should be modified to set dt to event time, but not set meta.dt (allowing EventGate to fill it in).
- All eventgates should use meta.dt as the Kafka timestamp.
- All gobblin ingestion jobs should use meta.dt as the partitioning timestamp
Ideally, any clients that produce directly to Kafka (not via EventGate) should use a maintained Event Platform producer library where these conventions are automatically handled (like wikimedia-event-utilities).
- meta.dt should most likely be used for 'ingestion' partitioning and kafka timestamps on 'raw'-ish 'log' tables. This means we should use meta.dt for Kafka timestamps and Hive table partitioning, unless:
- If the stream or table is meant for event time based querying, and it is expected that late data will be accounted for, then we should use dt. E.g. kafka compacted topics, or downstream / computed event tables.