tbs: user-story 9: Create an alert on metricsinfra for harbor being down on tools
As of writing this task, this can only be done directly in the DB, more info here:

Essentially, we have a cloud vps project, metricsinfra, where we have a setup with prometheus(alertmanager), specifically, there's a couple hosts:

That generate the alerts for prometheus from a DB, that is hosted in trove.

You have to login into that DB (you can find the credentials and host in the controller hosts config, /etc/prometheus-manager/config.yaml).

There you have the prometheusconfig database, with the table alerts, that you have to update with the alerts that you want to add, an example row:

*************************** 1. row ***************************
         id: 1
 project_id: 12
       name: GridQueueProblem
       expr: sge_queueproblems{project="tools",state=~".*(e|E).*"}
   duration: 30m
   severity: warn
annotations: {"summary": "Grid queue {{ $labels.queue }}@{{ $ }} is in state {{ $labels.state }}", "runbook": ""}

The column expr is the prometheus expression that you want to monitor, you can find out, check and test them here:

About the alert itself, it should have also an annotation called 'service' with the value 'toolforge,build_service'.