This should include:
- Retrieving metrics on prometheus side (if there's anything missing)
- Add alerts for "down" events - with runbooks
- Add a basic grafana board with the "up/down" metric to add as 'dashboard' to the alerts
This should include:
Change 840225 had a related patch set uploaded (by Majavah; author: Majavah):
[cloud/toolforge/jobs-framework-api@main] Configure prometheus flask exporter
Change 840225 merged by jenkins-bot:
[cloud/toolforge/jobs-framework-api@main] Configure prometheus flask exporter
Mentioned in SAL (#wikimedia-cloud-feed) [2022-10-10T08:41:57Z] <wm-bot2> deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api (afa90ed) (T320284) - cookbook ran by taavi@runko
Mentioned in SAL (#wikimedia-cloud-feed) [2022-10-10T08:44:51Z] <wm-bot2> deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api (afa90ed) (T320284) - cookbook ran by taavi@runko
Change 841033 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] P:toolforge::prometheus: scrape jobs-api
Change 841033 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] P:toolforge::prometheus: scrape jobs-api