Page MenuHomePhabricator

[jobs-api,jobs-emailer] Prometheus monitoring toolforge-jobs server side components
Open, HighPublic

Description

This should include:

  • Retrieving metrics on prometheus side (if there's anything missing)
  • Add alerts for "down" events - with runbooks
  • Add a basic grafana board with the "up/down" metric to add as 'dashboard' to the alerts

Event Timeline

Change 840225 had a related patch set uploaded (by Majavah; author: Majavah):

[cloud/toolforge/jobs-framework-api@main] Configure prometheus flask exporter

https://gerrit.wikimedia.org/r/840225

Change 840225 merged by jenkins-bot:

[cloud/toolforge/jobs-framework-api@main] Configure prometheus flask exporter

https://gerrit.wikimedia.org/r/840225

Change 841033 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] P:toolforge::prometheus: scrape jobs-api

https://gerrit.wikimedia.org/r/841033

Change 841033 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] P:toolforge::prometheus: scrape jobs-api

https://gerrit.wikimedia.org/r/841033

dcaro renamed this task from Prometheus monitoring toolforge-jobs server side components to [jobs-api,jobs-emailer] Prometheus monitoring toolforge-jobs server side components.Mar 11 2024, 2:32 PM
dcaro triaged this task as High priority.
dcaro edited projects, added Toolforge; removed Toolforge Jobs framework.
dcaro updated the task description. (Show Details)
dcaro moved this task from Backlog to Ready to be worked on on the Toolforge board.