Page MenuHomePhabricator

tbs: user-story 9: Create an alert on metricsinfra for harbor being down on toolsbeta
Closed, ResolvedPublic8 Estimated Story Points

Description

This can be achieved by reusing the blackbox probe, that is used in puppet with something like:

prometheus::blackbox::check::http { $static_domain:
    port                => 80,
    # this should always exist
    path                => '/admin/fingerprints/',
    ip_families         => ['ip4'],
    prometheus_instance => 'tools',
    team                => 'wmcs',
    severity            => 'warning',
}

Note that this should be attached to the harbor profile, and the path should be changed to something harbor will reply 2xx on when up.

Related Objects

StatusSubtypeAssignedTask
ResolvedLucasWerkmeister
Resolvedmatmarex
ResolvedLegoktm
ResolvedLegoktm
Opendcaro
Resolveddcaro
OpenNone
OpenNone
Resolveddcaro
Resolveddcaro
ResolvedRaymond_Ndibe
ResolvedRaymond_Ndibe
ResolvedRaymond_Ndibe
Resolveddcaro
Resolveddcaro

Event Timeline

dcaro triaged this task as High priority.Dec 14 2022, 2:02 PM
dcaro created this task.
dcaro added a project: Toolforge Build Service.
dcaro removed the point value for this task.
dcaro raised the priority of this task from High to Needs Triage.Mar 6 2023, 3:03 PM

Change 910798 had a related patch set uploaded (by Raymond Ndibe; author: Raymond Ndibe):

[operations/puppet@production] profile:toolforge:harbor: setup blackbox monitoring

https://gerrit.wikimedia.org/r/910798

dcaro changed the task status from Open to In Progress.Apr 25 2023, 3:19 PM

Change 910798 abandoned by Raymond Ndibe:

[operations/puppet@production] profile:toolforge:harbor: setup blackbox monitoring

Reason:

Issue was fixed using a different method here https://gerrit.wikimedia.org/r/c/operations/puppet/+/912844

https://gerrit.wikimedia.org/r/910798

Raymond_Ndibe changed the task status from In Progress to Stalled.May 4 2023, 5:20 PM
This comment has been deleted.