Page MenuHomePhabricator

[wmcs-backup] Backup snapshots of deleted volumes are never cleaned up
Open, Stalled, MediumPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • create a new volume (using Horizon or the OpenStack CLI)
  • wait for the wmcs-backup script to create a snapshot and a backup
  • delete the volume (using Horizon or the OpenStack CLI): Cinder cannot delete it because it has a snapshot, and moves it to the RBD trash

What happens?:

  • wmcs-backup should clean up the backup AND the snapshot, but it only cleans up the backup

Details

Event Timeline

fnegri changed the task status from Open to In Progress.Feb 29 2024, 4:30 PM
fnegri claimed this task.
fnegri triaged this task as Medium priority.

The problem sits in the ImageBackup.remove method, that tries to find snapshots, but if the volume is in the trash self.backup_entry.get_snapshot fails with an exception, that the remove method is just ignoring.

One possible solution is to modify the remove method to do the following:

  • check if the image associated to the backup is in the trash
  • if yes, take it out of the trash (rbd trash restore)
  • proceed with the snapshot search and removal
  • put image back in the trash (or delete entirely?)

I found a way to delete the snapshot while the image is in the trash, you have to use --image-id and point to the id the image has in the trash:

fnegri@cloudcontrol1005:~$ sudo rbd trash ls eqiad1-cinder --all --long |grep 478c4b988cf336
478c4b988cf336  volume-8f14e78f-8c95-4bf4-b84f-f702d78ca014  USER    Thu Feb 29 14:56:08 2024  expired at Thu Feb 29 14:56:08 2024

fnegri@cloudcontrol1005:~$ sudo rbd trash rm eqiad1-cinder/478c4b988cf336
rbd: image has snapshots - these must be deleted with 'rbd snap purge' before the image can be removed.
Removing image: 0% complete...failed.

fnegri@cloudcontrol1005:~$ sudo rbd snap ls --pool eqiad1-cinder --image-id 478c4b988cf336
SNAPID  NAME                                 SIZE   PROTECTED  TIMESTAMP
 90169  2024-02-29T07:00:15_cloudbackup2002  1 GiB             Thu Feb 29 07:00:17 2024

fnegri@cloudcontrol1005:~$ sudo rbd snap rm --image-id 478c4b988cf336 --pool eqiad1-cinder --snap 2024-02-29T07:00:15_cloudbackup2002
Removing snap: 100% complete...done.

fnegri@cloudcontrol1005:~$ sudo rbd trash rm eqiad1-cinder/478c4b988cf336
Removing image: 100% complete...done.
fnegri changed the task status from In Progress to Stalled.Jul 4 2024, 2:15 PM

Setting this to "Stalled" as I have too many things on my plate and this is not super urgent. I'd like to get back to it at some point but feel free to claim it if you are interested in this.

Change #1060784 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] wmcs-backups: add empty_trash command

https://gerrit.wikimedia.org/r/1060784

Change #1060784 merged by David Caro:

[operations/puppet@production] wmcs-backups: add empty_trash command

https://gerrit.wikimedia.org/r/1060784