Page MenuHomePhabricator

SystemdUnitFailed - lists2001 - sync-list-members
Closed, ResolvedPublic

Description

Common information

  • alertname: SystemdUnitFailed
  • instance: lists2001:9100
  • prometheus: ops
  • severity: critical
  • site: codfw
  • source: prometheus
  • team: collaboration-services

Firing alerts





Event Timeline

Jelto renamed this task from SystemdUnitFailed to SystemdUnitFailed (lists2001).Jul 16 2024, 7:38 AM
LSobanski renamed this task from SystemdUnitFailed (lists2001) to SystemdUnitFailed - lists2001 - sync-list-members.Jul 16 2024, 9:58 AM

comes from https://gerrit.wikimedia.org/r/c/operations/puppet/+/1053399

works fine on lists1004 but needs to be handled on inactive list server..

need to add code to ensure it's only enabled on active server

Dzahn triaged this task as Medium priority.Jul 16 2024, 2:50 PM

Mentioned in SAL (#wikimedia-operations) [2024-07-16T17:00:35Z] <mutante> lists2001 - systemctl reset-failed after gerrit:1054610 to fix T370098

[lists2001:~] $ sudo systemctl list-units --state=failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
0 loaded units listed.