Page MenuHomePhabricator

cloudvirt1013 apparent power loss
Closed, DuplicatePublic

Description

A few minutes ago (Sun Dec 22 09:30:15 UTC 2019) icinga reported "Host DOWN alert for cloudvirt1013."

10 minutes later (Dec 22 09:40:39) the system booted back up again and all guest VMs started ack up. It looks fine now; there's nothing suspicious in dmesg, and syslog looks like a sudden loss of power:

Dec 22 09:20:01 cloudvirt1013 systemd[1]: Starting Collect apt metrics for prometheus-node-exporter...
Dec 22 09:20:02 cloudvirt1013 systemd[1]: Started Collect apt metrics for prometheus-node-exporter.
Dec 22 09:21:01 cloudvirt1013 CRON[41868]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 22 09:22:01 cloudvirt1013 CRON[42427]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 22 09:23:01 cloudvirt1013 CRON[43167]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 22 09:24:01 cloudvirt1013 CRON[43773]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 22 09:25:01 cloudvirt1013 CRON[44356]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 22 09:25:01 cloudvirt1013 CRON[44357]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 22 09:26:01 cloudvirt1013 CRON[45148]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 22 09:27:01 cloudvirt1013 CRON[45739]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 22 09:40:39 cloudvirt1013 systemd-modules-load[712]: Inserted module 'br_netfilter'
Dec 22 09:40:39 cloudvirt1013 systemd-modules-load[712]: Inserted module 'ipmi_devintf'
Dec 22 09:40:39 cloudvirt1013 lvm[713]:   1 logical volume(s) in volume group "tank" monitored
Dec 22 09:40:39 cloudvirt1013 systemd-modules-load[712]: Inserted module 'nbd'
Dec 22 09:40:39 cloudvirt1013 systemd-modules-load[712]: Inserted module 'iscsi_tcp'
Dec 22 09:40:39 cloudvirt1013 systemd-modules-load[712]: Inserted module 'ib_iser'
Dec 22 09:40:39 cloudvirt1013 systemd-sysctl[765]: Couldn't write '262144' to 'net/netfilter/nf_conntrack_max', ignoring: No such file or directory
Dec 22 09:40:39 cloudvirt1013 systemd-sysctl[765]: Couldn't write '65' to 'net/netfilter/nf_conntrack_tcp_timeout_time_wait', ignoring: No such file or directory
Dec 22 09:40:39 cloudvirt1013 systemd[1]: Started udev Kernel Device Manager.
Dec 22 09:40:39 cloudvirt1013 systemd[1]: Starting Flush Journal to Persistent Storage...
Dec 22 09:40:39 cloudvirt1013 systemd[1]: Started Flush Journal to Persistent Storage.
Dec 22 09:40:39 cloudvirt1013 systemd[1]: Started udev Coldplug all Devices.
Dec 22 09:40:39 cloudvirt1013 systemd[1]: Found device NetXtreme BCM5719 Gigabit Ethernet PCIe (Ethernet 1Gb 4-port 331i Adapter).
Dec 22 09:40:39 cloudvirt1013 systemd[1]: Found device LOGICAL_VOLUME 2.
Dec 22 09:40:39 cloudvirt1013 systemd[1]: Found device LOGICAL_VOLUME 2.
Dec 22 09:40:39 cloudvirt1013 systemd[1]: Found device /dev/ttyS1.
Dec 22 09:40:39 cloudvirt1013 systemd[1]: Activating swap Swap Partition...

Event Timeline

Andrew added subscribers: Jclark-ctr, Cmjohnson.

@Cmjohnson or or @Jclark-ctr, I don't want to prompt another reboot but could you give the power cables on cloudvirt1013 an extra push and see if you notice any other reason for a power loss?