Page MenuHomePhabricator

Adopt conventions for server receive and client/event timestamps in non analytics event schemas
Open, Needs TriagePublic

Description

In T240460#6614767 we decided the following:

  • dt is always a client AKA event timestamp.
  • meta.dt is always a server side receive timestamp.
  • Which timestamp field is used for Kafka timestamp and Hive partitioning is configurable.

To accomplish this:

  • All schemas should be updated to have both a meta.dt and a dt field, with neither field required
  • eventgate-*-external instances should use meta.dt as the Kafka timestamp
  • eventgate-*-external Camus jobs should be configured to use meta.dt for hourly HDFS partitioning
  • eventgate-* internal instances should use dt as the Kafka timestamp, falling back to meta.dt
  • eventgate-* internal Camus jobs should be configured to use dt for hourly HDFS partitioning, falling back to meta.dt
  • EventBus should be modified to set dt, but not meta.dt (allowing EventGate to fill it in)
  • Other event producers (change-prop?) that don't use EventGate should set dt but not meta.dt

(In the future when we get rid of Camus, I'd expect to be able to use the Kafka message timestamp for hourly partitioning, and not worry about having specific timestamp field configuration for HDFS ingestion hourly partitioning.)

Event Timeline

fdans triaged this task as Medium priority.Nov 16 2020, 4:40 PM
fdans moved this task from Incoming to Event Platform on the Analytics board.
JArguello-WMF raised the priority of this task from Medium to Needs Triage.Jan 11 2023, 3:18 PM