As per title, we should make sure we have metrics from otel-coll and jaeger collected by Prometheus. And the respective grafana dashboards.
jaeger
- dashboard https://grafana.wikimedia.org/d/zLOi95xmk/jaeger adapted from https://grafana.com/grafana/dashboards/10001-jaeger/
- alerts https://gerrit.wikimedia.org/r/c/operations/alerts/+/959950
- Found an issue with alerts, reported upstream at https://github.com/jaegertracing/jaeger/issues/4771 (fixed)
otel-coll
- metrics https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/960056
- Some metrics collected, not all expected by upstream dashboard below
- dashboard https://grafana.wikimedia.org/d/SPebYW7Iz/opentelemetry-collector
- alerts https://gerrit.wikimedia.org/r/c/operations/alerts/+/967143