Page MenuHomePhabricator

NovafullstackSustainedFailures cloudcontrol1005:9100 The automated tests were unable to create, provision and decommission a VM in the last 5h
Closed, ResolvedPublic

Description

Common information

  • alertname: NovafullstackSustainedFailures
  • cluster: wmcs
  • instance: cloudcontrol1005:9100
  • job: node
  • prometheus: ops
  • service: openstack,cloudvps,novafullstack
  • severity: task
  • site: eqiad
  • source: prometheus
  • team: wmcs

Firing alerts


Event Timeline

Mentioned in SAL (#wikimedia-cloud) [2022-12-16T08:46:39Z] <dcaro> restart designate-sink on both cloudservice hosts (T322279)

Change 868618 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] novafullstack: don't crash if got error cleaning up some VMs

https://gerrit.wikimedia.org/r/868618

Change 868618 merged by David Caro:

[operations/puppet@production] novafullstack: don't crash if got error cleaning up some VMs

https://gerrit.wikimedia.org/r/868618

taavi claimed this task.