SystemdUnitDownForLong cloudbackup1003:9100 Unit backup_vms.service on node cloudbackup1003 has been down for long.
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	phaultfinder
	Mar 15 2023, 7:06 AM

Description

dashboard: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003
description: Unit backup_vms.service on node cloudbackup1003 has been down for long.
runbook: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDownForLong
summary: The systemd unit backup_vms.service on node cloudbackup1003 has been failing for more than two hours.

dashboard: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003
description: Unit backup_vms.service on node cloudbackup1003 has been down for long.
runbook: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDownForLong
summary: The systemd unit backup_vms.service on node cloudbackup1003 has been failing for more than two hours.
alertname: SystemdUnitDownForLong
cluster: wmcs
instance: cloudbackup1003:9100
job: node
name: backup_vms.service
prometheus: ops
severity: task
site: eqiad
source: prometheus
state: failed
team: wmcs
Source

Mentioned Here: T333315: WMCS: hundred of phabricator tickets were created for some alerts

Restricted Application added subscribers: dcaro, Aklapper. · View Herald Transcript