Page MenuHomePhabricator

Make sure we have observability for otel-coll and jaeger
Closed, ResolvedPublic

Description

As per title, we should make sure we have metrics from otel-coll and jaeger collected by Prometheus. And the respective grafana dashboards.

jaeger
otel-coll

Event Timeline

Change 959950 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/alerts@master] sre: add jaeger query/collector alerts

https://gerrit.wikimedia.org/r/959950

Change 960056 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/deployment-charts@master] otel-coll: enable prometheus scraping

https://gerrit.wikimedia.org/r/960056

Change 959950 merged by Filippo Giunchedi:

[operations/alerts@master] sre: add jaeger query/collector alerts

https://gerrit.wikimedia.org/r/959950

Change 960056 merged by Filippo Giunchedi:

[operations/deployment-charts@master] otel-coll: enable prometheus scraping

https://gerrit.wikimedia.org/r/960056

Change 966514 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/deployment-charts@master] otel-coll: bump resource limits

https://gerrit.wikimedia.org/r/966514

Change 966514 merged by Filippo Giunchedi:

[operations/deployment-charts@master] otel-coll: bump resource limits

https://gerrit.wikimedia.org/r/966514

After bumping the memory limits the collector is able to stay running and metrics/dashboard are working as expected!

Change 967143 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/alerts@master] sre: first iteration for otel-coll alerts

https://gerrit.wikimedia.org/r/967143

Change 967143 merged by Filippo Giunchedi:

[operations/alerts@master] sre: first iteration for otel-coll alerts

https://gerrit.wikimedia.org/r/967143

fgiunchedi claimed this task.
fgiunchedi updated the task description. (Show Details)

With otel collector alerts in place I'm calling this done!

Change 969703 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/alerts@master] sre: ignore pint promql/series checks for otel-coll

https://gerrit.wikimedia.org/r/969703

Change 969703 merged by Filippo Giunchedi:

[operations/alerts@master] sre: ignore pint promql/series checks for otel-coll

https://gerrit.wikimedia.org/r/969703