Page MenuHomePhabricator

[jobs-api,jobs-emailer] Prometheus monitoring toolforge-jobs server side components
Open, In Progress, HighPublic

Description

This should include:

  • Retrieving metrics on prometheus side (if there's anything missing)
  • Add alerts for "down" events - with runbooks
  • Add a basic grafana board with the "up/down" metric to add as 'dashboard' to the alerts
  • jobs-api (done)
  • jobs-emailer

Event Timeline

Change 840225 had a related patch set uploaded (by Majavah; author: Majavah):

[cloud/toolforge/jobs-framework-api@main] Configure prometheus flask exporter

https://gerrit.wikimedia.org/r/840225

Change 840225 merged by jenkins-bot:

[cloud/toolforge/jobs-framework-api@main] Configure prometheus flask exporter

https://gerrit.wikimedia.org/r/840225

Change 841033 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] P:toolforge::prometheus: scrape jobs-api

https://gerrit.wikimedia.org/r/841033

Change 841033 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] P:toolforge::prometheus: scrape jobs-api

https://gerrit.wikimedia.org/r/841033

dcaro renamed this task from Prometheus monitoring toolforge-jobs server side components to [jobs-api,jobs-emailer] Prometheus monitoring toolforge-jobs server side components.Mar 11 2024, 2:32 PM
dcaro triaged this task as High priority.
dcaro edited projects, added Toolforge; removed Toolforge Jobs framework.
dcaro updated the task description. (Show Details)
dcaro moved this task from Backlog to Ready to be worked on on the Toolforge board.
dcaro changed the task status from Open to In Progress.Thu, Nov 14, 1:56 PM
dcaro moved this task from Next Up to In Progress on the Toolforge (Toolforge iteration 16) board.