We currently aggregate our Navigation Timing data from real-time EventLogging beacons in two ways:
- Via webperf/navtiming: EventLogging subscriber on our server writing to Statsd/Graphite (also subject to whisper aggregation).
- Via Coal: EventLogging subscriber directly on the Graphite server writing directly to disk as a custom backend that is not aggregated by Statsd, and not aggregated by Whisper.
Where webperf/navtiming also generates per-minute percentiles (via Statsd), the Coal logger only produces medians.
Statsd/Graphite has lots of features and is pretty scalable, but does so at the cost of lossy aggregation.
Coal on the other hand is essentially without aggregation, except its own (5-minute moving median). No further aggregation occurs.
Coal was created by @ori in 2015 specifically for Navigation Timing. Since then, SRE has deployed Prometheus (https://prometheus.io/), which (unlike Graphite) has support for storing time series data without aggregation and reliable percentiles.
Benefits:
- Simplify our software stack (by not having coal and coal-web hosted on the Graphite machine, per T158837).
- Open up exciting features in Grafana that are only available to non-aggregated backends, such as Histogram, Heatmap and more.