In T240460#6614767 we decided the following:
- `dt` is always a client AKA event timestamp.
- `meta.dt` is always a server side receive timestamp.
- Which timestamp field is used for Kafka timestamp and Hive partitioning is configurable.
To accomplish this:
[] All schemas should be updated to have both a `meta.dt` and a `dt` field, with neither field required
[] eventgate-*-external instances should use `meta.dt` as the Kafka timestamp
[] eventgate-*-external Camusanalytics hadoop ingestion (gobblin) jobs should be configured to use `meta.dt` for hourly HDFS partitioning
[] eventgate-* internal instances should use `dt` as the Kafka timestamp, falling back to `meta.dt`
[] eventgate-* internal Camusanalytics hadoop ingestion (gobblin) jobs should be configured to use `dt` for hourly HDFS partitioning, falling back to `meta.dt`
[] EventBus should be modified to set `dt`, but not `meta.dt` (allowing EventGate to fill it in)
[] Other event producers (change-prop?) that don't use EventGate should set `dt` but not `meta.dt`
(In the future when we get rid of CamusAny clients that produce directly to Kafka (not via EventGate) should use a maintained Event Platform producer library (like [[ https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/ | wikimedia-event-utilities ]]).
Now that use Gobblin for Hadoop ingestion, I'd expect to be able to use the Kafka message timestamp for hourly partitioning, and not worry about having specific timestamp field configuration for HDFS ingestion hourly partitioning.)once eventgates all set the correct Kafka timestamp,we can use the Kafka timestamp as the HDFS time partitioning field, rather than vary the config between different ingestion jobs.