Currently, we extract latency data from the apache logs using mtail. We want to keep doing so on k8s, but this might be challenging given how we will have to manage such logs (see the parent task).
What we have now is a latency histogram divided by:
- cluster
- status code
- request handler (might be superfluous)
- request method
- endpoint
I still don't have solutions to this problem, but some ideas towards a solution are already in my mind:
From logfiles on centrallog
- Save the logs on centrallog with short retention,
- run mtail on such logs
Modify mtail to be able to consume logs from kafka
- In this idea, we'd be able to just consume a kafka topic directly from mtail
- It probably requires more work that we actually want to commit to
Use envoy to extract the same data
- We currently lack some dimensions to the telemetry, like separation between http verbs.
- We'd have to force an envoy configuration just to separate the telemetry, but the effort might be worth it.