Page MenuHomePhabricator

[jobs-api,jobs-emailer] Prometheus monitoring toolforge-jobs server side components
Open, In Progress, HighPublic


This should include:

  • Retrieving metrics on prometheus side (if there's anything missing)
  • Add alerts for "down" events - with runbooks
  • Add a basic grafana board with the "up/down" metric to add as 'dashboard' to the alerts
  • jobs-api
    • gather stats
    • add alert
  • jobs-emailer
    • gather stats
    • add alerts


TitleReferenceAuthorSource BranchDest Branch
jobs-api: add alerts for it being downrepos/cloud/toolforge/alerts!20dcaroadd_jobs_apimain
webserver: add a minimal metrics endpointrepos/cloud/toolforge/jobs-emailer!7dcaroadd_prometheus_statsmain
Customize query in GitLab

Event Timeline

Change 840225 had a related patch set uploaded (by Majavah; author: Majavah):

[cloud/toolforge/jobs-framework-api@main] Configure prometheus flask exporter

Change 840225 merged by jenkins-bot:

[cloud/toolforge/jobs-framework-api@main] Configure prometheus flask exporter

Change 841033 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] P:toolforge::prometheus: scrape jobs-api

Change 841033 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] P:toolforge::prometheus: scrape jobs-api

dcaro renamed this task from Prometheus monitoring toolforge-jobs server side components to [jobs-api,jobs-emailer] Prometheus monitoring toolforge-jobs server side components.Mar 11 2024, 2:32 PM
dcaro triaged this task as High priority.
dcaro edited projects, added Toolforge; removed Toolforge Jobs framework.
dcaro updated the task description. (Show Details)
dcaro moved this task from Backlog to Ready to be worked on on the Toolforge board.
dcaro changed the task status from Open to In Progress.Nov 14 2024, 1:56 PM
dcaro moved this task from Next Up to In Progress on the Toolforge (Toolforge iteration 16) board.