From an alert:
Icinga/Check unit status of backup_vms summary: CRITICAL: Status of the systemd unit backup_vms 1 hour ago instance: cloudvirt1024 team: wmcs @cluster: wikimedia.org runbook
Looking at the logs it seems it was unable to find a specific image:
Oct 27 06:54:39 cloudvirt1024 wmcs-backup[51353]: INFO:[2021-10-27 06:54:39,714] Creating backup for vm wikisp.ceres-01(5b62f768-ba2a-4c0d-8ee2-0bffba99222f)
Oct 27 06:54:39 cloudvirt1024 wmcs-backup[51353]: INFO:[2021-10-27 06:54:39,714] Forcing a full backup as there's no previous one with a snapshot.
Oct 27 06:54:39 cloudvirt1024 wmcs-backup[51353]: INFO:[2021-10-27 06:54:39,714] Creating full backup of pool:eqiad1-compute, image_name:5b62f768-ba2a-4c0d-8ee2-0bffba99222f_disk, snapshot_name:2021-10-27T06:54:39_cloudvirt1024
Oct 27 06:54:39 cloudvirt1024 wmcs-backup[51353]: rbd: error opening image 5b62f768-ba2a-4c0d-8ee2-0bffba99222f_disk: (2) No such file or directory
Oct 27 06:54:39 cloudvirt1024 wmcs-backup[51353]: WARNING:[2021-10-27 06:54:39,810] Got an error trying to backup ceres-01, try n#0 of 3: Command '['/usr/bin/rbd', 'snap', 'create', 'eqiad1-compute/5b62f768-ba2a-4c0d-8ee2-0bffba99222f_disk@2021-10-27T06:54:39_cloudvirt1024']' returned non-zero exit status 2.
Oct 27 06:54:39 cloudvirt1024 wmcs-backup[51353]: INFO:[2021-10-27 06:54:39,827] Creating backup for vm wikisp.ceres-01(5b62f768-ba2a-4c0d-8ee2-0bffba99222f)
Oct 27 06:54:39 cloudvirt1024 wmcs-backup[51353]: INFO:[2021-10-27 06:54:39,827] Forcing a full backup as there's no previous one with a snapshot.
Oct 27 06:54:39 cloudvirt1024 wmcs-backup[51353]: INFO:[2021-10-27 06:54:39,828] Creating full backup of pool:eqiad1-compute, image_name:5b62f768-ba2a-4c0d-8ee2-0bffba99222f_disk, snapshot_name:2021-10-27T06:54:39_cloudvirt1024
Oct 27 06:54:39 cloudvirt1024 wmcs-backup[51353]: rbd: error opening image 5b62f768-ba2a-4c0d-8ee2-0bffba99222f_disk: (2) No such file or directory
Oct 27 06:54:39 cloudvirt1024 wmcs-backup[51353]: WARNING:[2021-10-27 06:54:39,911] Got an error trying to backup ceres-01, try n#1 of 3: Command '['/usr/bin/rbd', 'snap', 'create', 'eqiad1-compute/5b62f768-ba2a-4c0d-8ee2-0bffba99222f_disk@2021-10-27T06:54:39_cloudvirt1024']' returned non-zero exit status 2.
Oct 27 06:54:39 cloudvirt1024 wmcs-backup[51353]: INFO:[2021-10-27 06:54:39,912] Creating backup for vm wikisp.ceres-01(5b62f768-ba2a-4c0d-8ee2-0bffba99222f)
Oct 27 06:54:39 cloudvirt1024 wmcs-backup[51353]: INFO:[2021-10-27 06:54:39,912] Forcing a full backup as there's no previous one with a snapshot.
Oct 27 06:54:39 cloudvirt1024 wmcs-backup[51353]: INFO:[2021-10-27 06:54:39,912] Creating full backup of pool:eqiad1-compute, image_name:5b62f768-ba2a-4c0d-8ee2-0bffba99222f_disk, snapshot_name:2021-10-27T06:54:39_cloudvirt1024
Oct 27 06:54:39 cloudvirt1024 wmcs-backup[51353]: rbd: error opening image 5b62f768-ba2a-4c0d-8ee2-0bffba99222f_disk: (2) No such file or directory
Oct 27 06:54:39 cloudvirt1024 wmcs-backup[51353]: WARNING:[2021-10-27 06:54:39,996] Got an error trying to backup ceres-01, try n#2 of 3: Command '['/usr/bin/rbd', 'snap', 'create', 'eqiad1-compute/5b62f768-ba2a-4c0d-8ee2-0bffba99222f_disk@2021-10-27T06:54:39_cloudvirt1024']' returned non-zero exit status 2.
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: Traceback (most recent call last):
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: File "/usr/local/sbin/wmcs-backup", line 2028, in <module>
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: args.func()
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: File "/usr/local/sbin/wmcs-backup", line 1863, in <lambda>
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: ).backup_assigned_vms(from_cache=args.from_cache, noop=args.noop)
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: File "/usr/local/sbin/wmcs-backup", line 1318, in backup_assigned_vms
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: noop=noop,
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: File "/usr/local/sbin/wmcs-backup", line 1119, in create_vm_backup
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: vm_info=vm_info, noop=noop
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: File "/usr/local/sbin/wmcs-backup", line 1028, in create_vm_backup
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: new_backup = self.vms_backups[vm_id].create_next_backup(noop=noop)
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: File "/usr/local/sbin/wmcs-backup", line 707, in create_next_backup
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: noop=noop,
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: File "/usr/local/sbin/wmcs-backup", line 589, in create_full_backup
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: noop=noop,
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: File "/usr/lib/python3/dist-packages/rbd2backy2.py", line 190, in create_full_backup
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: noop=noop,
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: File "/usr/lib/python3/dist-packages/rbd2backy2.py", line 91, in create
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: run_command([RBD, "snap", "create", str(new_snapshot)], noop=noop)
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: File "/usr/lib/python3/dist-packages/rbd2backy2.py", line 31, in run_command
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: output = subprocess.check_output(args).decode("utf8")
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: File "/usr/lib/python3.7/subprocess.py", line 395, in check_output
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: **kwargs).stdout
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: File "/usr/lib/python3.7/subprocess.py", line 487, in run
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: output=stdout, stderr=stderr)
Oct 27 06:54:40 cloudvirt1024 wmcs-backup[51353]: subprocess.CalledProcessError: Command '['/usr/bin/rbd', 'snap', 'create', 'eqiad1-compute/5b62f768-ba2a-4c0d-8ee2-0bffba99222f_disk@2021-10-27T06:54:39_cloudvirt1024']' returned non-zero exit status 2.
Oct 27 06:54:40 cloudvirt1024 systemd[1]: backup_vms.service: Main process exited, code=exited, status=1/FAILURE