We got some wmcs-backups failures on cloudvirt1024 today, log:
Feb 04 06:44:21 cloudvirt1024 wmcs-backup[4882]: INFO:[2022-02-04 06:44:21,322] Cleaned up 1 expired backups for VM traffic.traffic-ncredir(0478d505-62dd-4dad-9fd3-5ba84b9ba52b)
Feb 04 06:44:21 cloudvirt1024 wmcs-backup[4882]: INFO:[2022-02-04 06:44:21,323] Creating backup for vm traffic.diffscan(54e36627-6f0c-46e7-a035-994a2526a0ab)
Feb 04 06:44:21 cloudvirt1024 wmcs-backup[4882]: INFO:[2022-02-04 06:44:21,508] Creating differential backup of pool:eqiad1-compute image_name:54e36627-6f0c-46e7-a035-994a2526a0ab_disk from rbd snapshot eqiad1-compute/54e36627-6f0c-46e7-a035-994a2526a0ab_disk@2022-02-03T06:53:00_cloudvirt1024 and backy2 version ee635e2e-84bd-11ec-95fb-b02628295df0
Feb 04 06:44:41 cloudvirt1024 python3.7[39042]: 2022-02-04T06:44:41.689+0000 7f367ebc8700 -1 asok(0x7f3678001400) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/guests/ceph-client.eqiad1-compute.39042.139871918246072.asok': (13) Permission denied
Feb 04 06:44:43 cloudvirt1024 python3.7[39066]: 2022-02-04T06:44:43.373+0000 7fe5e3fff700 -1 asok(0x7fe5e4001400) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/guests/ceph-client.eqiad1-compute.39066.140625349462200.asok': (13) Permission denied
Feb 04 06:44:47 cloudvirt1024 wmcs-backup[4882]: [37B blob data]
Feb 04 06:44:47 cloudvirt1024 wmcs-backup[4882]: INFO:[2022-02-04 06:44:47,943] Cleaning up expired backups for VM traffic.diffscan(54e36627-6f0c-46e7-a035-994a2526a0ab)
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: INFO:[2022-02-04 06:44:49,497] Cleaned up 1 expired backups for VM traffic.diffscan(54e36627-6f0c-46e7-a035-994a2526a0ab)
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: INFO:[2022-02-04 06:44:49,534] Creating backup for vm trove.maps-postgres(6f43781a-bdf9-4f72-8c65-1bee29f7ef6c)
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: INFO:[2022-02-04 06:44:49,535] Forcing a full backup as there's no previous one with a snapshot.
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: INFO:[2022-02-04 06:44:49,535] Creating full backup of pool:eqiad1-compute, image_name:6f43781a-bdf9-4f72-8c65-1bee29f7ef6c_disk, snapshot_name:2022-02-04T06:44:49_cloudvirt1024
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: rbd: error opening image 6f43781a-bdf9-4f72-8c65-1bee29f7ef6c_disk: (2) No such file or directory
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: WARNING:[2022-02-04 06:44:49,615] Got an error trying to backup maps-postgres, try n#0 of 3: Command '['/usr/bin/rbd', 'snap', 'create', 'eqiad1-compute/6f43781a-bdf9-4f72-8c65-1bee29f7ef6c_disk@2022-02-04T06:44:49_cloudvirt1024']' returned non-zero exit status 2.
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: INFO:[2022-02-04 06:44:49,632] Creating backup for vm trove.maps-postgres(6f43781a-bdf9-4f72-8c65-1bee29f7ef6c)
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: INFO:[2022-02-04 06:44:49,632] Forcing a full backup as there's no previous one with a snapshot.
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: INFO:[2022-02-04 06:44:49,633] Creating full backup of pool:eqiad1-compute, image_name:6f43781a-bdf9-4f72-8c65-1bee29f7ef6c_disk, snapshot_name:2022-02-04T06:44:49_cloudvirt1024
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: rbd: error opening image 6f43781a-bdf9-4f72-8c65-1bee29f7ef6c_disk: (2) No such file or directory
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: WARNING:[2022-02-04 06:44:49,705] Got an error trying to backup maps-postgres, try n#1 of 3: Command '['/usr/bin/rbd', 'snap', 'create', 'eqiad1-compute/6f43781a-bdf9-4f72-8c65-1bee29f7ef6c_disk@2022-02-04T06:44:49_cloudvirt1024']' returned non-zero exit status 2.
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: INFO:[2022-02-04 06:44:49,706] Creating backup for vm trove.maps-postgres(6f43781a-bdf9-4f72-8c65-1bee29f7ef6c)
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: INFO:[2022-02-04 06:44:49,706] Forcing a full backup as there's no previous one with a snapshot.
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: INFO:[2022-02-04 06:44:49,706] Creating full backup of pool:eqiad1-compute, image_name:6f43781a-bdf9-4f72-8c65-1bee29f7ef6c_disk, snapshot_name:2022-02-04T06:44:49_cloudvirt1024
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: rbd: error opening image 6f43781a-bdf9-4f72-8c65-1bee29f7ef6c_disk: (2) No such file or directory
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: WARNING:[2022-02-04 06:44:49,783] Got an error trying to backup maps-postgres, try n#2 of 3: Command '['/usr/bin/rbd', 'snap', 'create', 'eqiad1-compute/6f43781a-bdf9-4f72-8c65-1bee29f7ef6c_disk@2022-02-04T06:44:49_cloudvirt1024']' returned non-zero exit status 2.
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: Traceback (most recent call last):
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: File "/usr/local/sbin/wmcs-backup", line 2028, in <module>
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: args.func()
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: File "/usr/local/sbin/wmcs-backup", line 1863, in <lambda>
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: ).backup_assigned_vms(from_cache=args.from_cache, noop=args.noop)
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: File "/usr/local/sbin/wmcs-backup", line 1318, in backup_assigned_vms
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: noop=noop,
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: File "/usr/local/sbin/wmcs-backup", line 1119, in create_vm_backup
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: vm_info=vm_info, noop=noop
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: File "/usr/local/sbin/wmcs-backup", line 1028, in create_vm_backup
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: new_backup = self.vms_backups[vm_id].create_next_backup(noop=noop)
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: File "/usr/local/sbin/wmcs-backup", line 707, in create_next_backup
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: noop=noop,
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: File "/usr/local/sbin/wmcs-backup", line 589, in create_full_backup
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: noop=noop,
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: File "/usr/lib/python3/dist-packages/rbd2backy2.py", line 190, in create_full_backup
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: noop=noop,
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: File "/usr/lib/python3/dist-packages/rbd2backy2.py", line 91, in create
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: run_command([RBD, "snap", "create", str(new_snapshot)], noop=noop)
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: File "/usr/lib/python3/dist-packages/rbd2backy2.py", line 31, in run_command
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: output = subprocess.check_output(args).decode("utf8")
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: File "/usr/lib/python3.7/subprocess.py", line 395, in check_output
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: **kwargs).stdout
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: File "/usr/lib/python3.7/subprocess.py", line 487, in run
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: output=stdout, stderr=stderr)
Feb 04 06:44:49 cloudvirt1024 wmcs-backup[4882]: subprocess.CalledProcessError: Command '['/usr/bin/rbd', 'snap', 'create', 'eqiad1-compute/6f43781a-bdf9-4f72-8c65-1bee29f7ef6c_disk@2022-02-04T06:44:49_cloudvirt1024']' returned non-zero exit status 2.
Feb 04 06:44:50 cloudvirt1024 systemd[1]: backup_vms.service: Main process exited, code=exited, status=1/FAILURE
Feb 04 06:44:50 cloudvirt1024 systemd[1]: backup_vms.service: Failed with result 'exit-code'.I have the feeling this is just a transient error, but I'm creating a task anyway so we can track it.