Epic parent task for tracking a v0 minimal deployment of distributed tracing.
Summary of current thinking & discussions:
- We'll be deploying Jaeger on our k8s, as it is essentially the mature FLOSS option
- Will use OpenSearch as the backing store
- But we'll be using OTel Collector (rather than jaeger-agent) as the local daemon for collecting and exporting trace data to Jaeger, as it gives us flexibility in the future
- To begin with, it's likely that the only 'application' actually exporting trace data will be Envoy. For Mediawiki it serves as both the TLS terminator for incoming requests, and as a proxy for outgoing requests to services, so it's a single point that captures a lot of data simply by writing some configuration stanzas.
- We'll prioritize modifying Mediawiki and service-template-node to propagate the various tracing metadata headers from incoming requests to outgoing requests, which Envoy will be able to see