Created as prompted by @taavi, the Toolforge API gateway (api.svc.tools.eqiad1.wikimedia.cloud) does not appear to be monitored for uptime.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
In Progress | dcaro | T334240 [cloudceph] Slow operations - tracking task | |||
Resolved | taavi | T348634 ceph slow ops 2023-10-11 | |||
Resolved | dcaro | T348633 [api-gateway] add alert for uptime |
Event Timeline
On one side you have to register the url to be added to the pingthing/blackbox monitoring config, same as these but for the new url:
https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/manifests/toolforge/k8s/haproxy.pp#52
That will add it to the monitored list, note that it will add also an alert by default, so that should be enough.
This task is related (not the same, not overridden by, just related) T367389: [k8s,infra,alerting] improve HAproxy and k8s apiserver interaction
dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/53
api: expose the healthz endpoint too
Change #1093339 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/puppet@production] toolforge:haproxy: add api gateway health check
dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/53
api: expose the healthz endpoint too
group_203_bot_4866fc124f4b41659f667468a6115cf3 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/616
api-gateway: bump to 0.0.56-20241120144516-f10abf2a
dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/616
api-gateway: bump to 0.0.56-20241120144516-f10abf2a
Change #1093339 merged by David Caro:
[operations/puppet@production] toolforge:haproxy: add api gateway health check
Change #1093384 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/puppet@production] toolforge:haproxy: monitor the https port, not the internal one
Change #1093384 merged by David Caro:
[operations/puppet@production] toolforge:haproxy: monitor the https port, not the internal one
Change #1093395 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/puppet@production] toolforge:haproxy: use the external name and force tls
Change #1093395 merged by David Caro:
[operations/puppet@production] toolforge:haproxy: use the external name and ip and force tls