Page MenuHomePhabricator

Alert in need of triage: PybalBackendDown (instance elastic2090:0)
Closed, ResolvedPublic

Description

The alert PybalBackendDown has started firing 1 month ago.

Labels
alertname=PybalBackendDown
instance=elastic2090:0
prometheus=ops
service=search-psi-https_9643
severity=warning
site=codfw
source=prometheus
team=sre
Annotations
NameContent
dashboardTODO
descriptionPybal has been failing health checks for elastic2090:0 for a long time.
runbookTODO
summaryPybal backend elastic2090:0 is down (search-psi-https_9643)
Links

Triage metadata. Do not delete.
fingerprint=e52a94a8354dc4d3

Event Timeline

When I brought this host online a few weeks back, I accidentally added it to the psi pool. I've since fixed this , but my fix seems to be incomplete. Will finish this out when I can, but I did want to share the reason.

Change #1032784 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] elasticsearch: add elastic2090 to correct pybal pool

https://gerrit.wikimedia.org/r/1032784

Change #1032784 merged by Bking:

[operations/puppet@production] elasticsearch: add elastic2090 to correct pybal pool

https://gerrit.wikimedia.org/r/1032784

The above prometheus query no longer returns any results, so I believe this is fixed. Closing...