Page MenuHomePhabricator

Develop strategy for mitigating degenerate client timestamps in event data
Open, Needs TriagePublic

Description

The client_dt field contains a timestamp that is derived from a software client. Most clients provide no authoritative source of time, and therefore it is not uncommon for them to report times that are a significant distance in the past or future. There are standard strategies for mitigating these problems, all we need to do is adopt one and decide where to locate the intervention.

Event Timeline

In order to mitigate significant variance on the client-side generated timestamps, could we establish a beacon that could generate server-generated timestamps in coordination with the device-initiated burst requests? It might be much safer to use the server as the source of truth.

Events coming through eventgate that do not set meta.dt (which EventLogging extension does not), will have meta.dt set to the time that eventgate receives the event. client_dt is set by clients. So the data has both the server receive time and the client's event time.

Hourly partitioning in Hive uses meta.dt.

@Ottomata: for clarification, what @dcipoletti is talking about is changing where client_dt gets that client-side timestamp from. It would still be the client-side time of when the event was generated, but instead of trusting the client to have an approximately accurate and up-to-date time we could potentially:

  1. query something like https://wikimedia.org/api/rest_v1/time possibly POSTing the client-side time at request to do latency adjustments with response, store {client-side, server-side} pair for converting time from client-side (which may be in the past or future) to server-side time (which we trust)
  2. instead of setting client_dt to new Date().toISOString(), set it to mapClientSideToServerSide(new Date()).toISOString()

So client_dt is still the client-side timestamp but more trustworthy, meta.dt is still set by EventGate on receipt.