Page MenuHomePhabricator

cloudvirt1022/Check unit status of backup_vms is CRITICAL
Closed, ResolvedPublic

Description

We recently rebuilt cloudvirt1022 and wiped out existing backups; on a new pass (either the first or the second pass after the drive was cleared up) this alert fired.

Mar 15 18:18:53 cloudvirt1022 wmcs-backup[599040]: [37B blob data]
Mar 15 18:18:53 cloudvirt1022 wmcs-backup[567639]: INFO:[2022-03-15 18:18:53,986] Cleaning up expired backups for VM tools.tools-puppetmaster-02(e4a34e65-c079-4942-a306-6e13cc2a226>
Mar 15 18:18:53 cloudvirt1022 wmcs-backup[567639]: INFO:[2022-03-15 18:18:53,986] Cleaned up 0 expired backups for VM tools.tools-puppetmaster-02(e4a34e65-c079-4942-a306-6e13cc2a22>
Mar 15 18:18:53 cloudvirt1022 wmcs-backup[567639]: INFO:[2022-03-15 18:18:53,987] Creating backup for vm tools.tools-prometheus-03(b24e29d7-a468-4882-9652-9863c8acfb88)
Mar 15 18:18:54 cloudvirt1022 wmcs-backup[567639]: WARNING:[2022-03-15 18:18:54,076] Got an error trying to backup tools-prometheus-03, try n#0 of 3: 'VMBackup' object has no attri>
Mar 15 18:18:54 cloudvirt1022 wmcs-backup[567639]: INFO:[2022-03-15 18:18:54,076] Creating backup for vm tools.tools-prometheus-03(b24e29d7-a468-4882-9652-9863c8acfb88)
Mar 15 18:18:54 cloudvirt1022 wmcs-backup[567639]: WARNING:[2022-03-15 18:18:54,158] Got an error trying to backup tools-prometheus-03, try n#1 of 3: 'VMBackup' object has no attri>
Mar 15 18:18:54 cloudvirt1022 wmcs-backup[567639]: INFO:[2022-03-15 18:18:54,158] Creating backup for vm tools.tools-prometheus-03(b24e29d7-a468-4882-9652-9863c8acfb88)
Mar 15 18:18:54 cloudvirt1022 wmcs-backup[567639]: WARNING:[2022-03-15 18:18:54,238] Got an error trying to backup tools-prometheus-03, try n#2 of 3: 'VMBackup' object has no attri>
Mar 15 18:18:54 cloudvirt1022 wmcs-backup[567639]: Traceback (most recent call last):
Mar 15 18:18:54 cloudvirt1022 wmcs-backup[567639]:   File "/usr/local/sbin/wmcs-backup", line 2037, in <module>
Mar 15 18:18:54 cloudvirt1022 wmcs-backup[567639]:     args.func()
Mar 15 18:18:54 cloudvirt1022 wmcs-backup[567639]:   File "/usr/local/sbin/wmcs-backup", line 1870, in <lambda>
Mar 15 18:18:54 cloudvirt1022 wmcs-backup[567639]:     func=lambda: get_current_instances_state(
Mar 15 18:18:54 cloudvirt1022 wmcs-backup[567639]:   File "/usr/local/sbin/wmcs-backup", line 1313, in backup_assigned_vms
Mar 15 18:18:54 cloudvirt1022 wmcs-backup[567639]:     self.create_vm_backup(
Mar 15 18:18:54 cloudvirt1022 wmcs-backup[567639]:   File "/usr/local/sbin/wmcs-backup", line 1118, in create_vm_backup
Mar 15 18:18:54 cloudvirt1022 wmcs-backup[567639]:     new_backup = self.projects_backups[project_name].create_vm_backup(
Mar 15 18:18:54 cloudvirt1022 wmcs-backup[567639]:   File "/usr/local/sbin/wmcs-backup", line 1028, in create_vm_backup
Mar 15 18:18:54 cloudvirt1022 wmcs-backup[567639]:     new_backup = self.vms_backups[vm_id].create_next_backup(noop=noop)
Mar 15 18:18:54 cloudvirt1022 wmcs-backup[567639]:   File "/usr/local/sbin/wmcs-backup", line 690, in create_next_backup
Mar 15 18:18:54 cloudvirt1022 wmcs-backup[567639]:     and not last_backup_with_snapshot.valid
Mar 15 18:18:54 cloudvirt1022 wmcs-backup[567639]: AttributeError: 'VMBackup' object has no attribute 'valid'
Mar 15 18:18:54 cloudvirt1022 systemd[1]: backup_vms.service: Main process exited, code=exited, status=1/FAILURE
Mar 15 18:18:54 cloudvirt1022 systemd[1]: backup_vms.service: Failed with result 'exit-code'.

Event Timeline

Change 770999 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Fix invalid ref to last_backup_with_snapshot.valid

https://gerrit.wikimedia.org/r/770999

Change 770999 merged by Andrew Bogott:

[operations/puppet@production] Fix invalid ref to last_backup_with_snapshot.valid

https://gerrit.wikimedia.org/r/770999

dcaro changed the task status from Open to In Progress.Mar 22 2022, 2:42 PM
dcaro claimed this task.
dcaro moved this task from Today to Doing on the User-dcaro board.

This is fixed now.