Page MenuHomePhabricator

Practice restoring ceph backups
Closed, ResolvedPublic

Description

I'd like all the wmcs SREs to have the experience of restoring a backed up ceph VM.

We can use ceph-restore-practice.testlabs.eqiad1.wikimedia.cloud as a test subject. Restoration docs are at https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Instance_backups#Restoring

You'll know that the VM is back up if you can

$ ssh ceph-restore-practice.testlabs.eqiad1.wikimedia.cloud cat /proofoflife.txt

Once Brooke and Arturo have confirmed a successful backup (and amended the docs as needed) this task can be closed.

Event Timeline

Andrew added a subscriber: nskaggs.
bstorm@ceph-restore-practice:~$ cat /proofoflife.txt
You did it!  This is the VM that you were trying to restore.

Ok! The process worked. I'll add a note about the snapshots since they have to be removed to delete a volume, apparently.

Mentioned in SAL (#wikimedia-cloud) [2020-09-07T15:24:07Z] <arturo> practicing disk restoration on ceph-restore-practice (T260941)

Before start:

aborrero@ceph-restore-practice:~$ cat /proofoflife.txt 
You did it!  This is the VM that you were trying to restore.
aborrero@ceph-restore-practice:~$ sudo rm /proofoflife.txt 
aborrero@ceph-restore-practice:~$ cat /proofoflife.txt 
cat: /proofoflife.txt: No such file or directory
aborrero@ceph-restore-practice:~$ sudo poweroff

following the restoration workflow:

root@cloudvirt1024:~# backy2 ls 5624cd3e-eeb8-421c-b8e4-7b1056a61550_disk
    INFO: [backy2.logging] $ /usr/bin/backy2 ls 5624cd3e-eeb8-421c-b8e4-7b1056a61550_disk
+---------------------+-------------------------------------------+---------------------+------+-------------+--------------------------------------+-------+-----------+----------------------------+---------------------+
|         date        | name                                      | snapshot_name       | size |  size_bytes |                 uid                  | valid | protected | tags                       |        expire       |
+---------------------+-------------------------------------------+---------------------+------+-------------+--------------------------------------+-------+-----------+----------------------------+---------------------+
[..]
| 2020-09-07 02:00:53 | 5624cd3e-eeb8-421c-b8e4-7b1056a61550_disk | 2020-09-07T02:00:50 | 5120 | 21474836480 | f5eec49e-f0ad-11ea-8511-b02628295df0 |   1   |     0     | b_daily                    | 2020-09-14 00:00:00 |
+---------------------+-------------------------------------------+---------------------+------+-------------+--------------------------------------+-------+-----------+----------------------------+---------------------+
    INFO: [backy2.logging] Backy complete.

root@cloudvirt1024:~# backy2 restore f5eec49e-f0ad-11ea-8511-b02628295df0 rbd://eqiad1-compute/5624cd3e-eeb8-421c-b8e4-7b1056a61550_disk
    INFO: [backy2.logging] $ /usr/bin/backy2 restore f5eec49e-f0ad-11ea-8511-b02628295df0 rbd://eqiad1-compute/5624cd3e-eeb8-421c-b8e4-7b1056a61550_disk
   ERROR: [backy2.logging] Image already exists: 5624cd3e-eeb8-421c-b8e4-7b1056a61550_disk
Error opening restore target.
root@cloudvirt1024:~# rbd rm eqiad1-compute/5624cd3e-eeb8-421c-b8e4-7b1056a61550_disk
2020-09-07 15:21:55.468 7f3858ff9700 -1 librbd::image::PreRemoveRequest: 0x55b576311dd0 check_image_snaps: image has snapshots - not removing
Removing image: 0% complete...failed.
rbd: image has snapshots - these must be deleted with 'rbd snap purge' before the image can be removed.
root@cloudvirt1024:~# rbd --pool eqiad1-compute snap purge 5624cd3e-eeb8-421c-b8e4-7b1056a61550_disk
Removing all snapshots: 100% complete...done.
root@cloudvirt1024:~# rbd --pool eqiad1-compute snap ls eqiad1-compute/5624cd3e-eeb8-421c-b8e4-7b1056a61550_disk
root@cloudvirt1024:~# rbd rm eqiad1-compute/5624cd3e-eeb8-421c-b8e4-7b1056a61550_disk
Removing image: 100% complete...done.
root@cloudvirt1024:~# backy2 restore f5eec49e-f0ad-11ea-8511-b02628295df0 rbd://eqiad1-compute/5624cd3e-eeb8-421c-b8e4-7b1056a61550_disk
    INFO: [backy2.logging] $ /usr/bin/backy2 restore f5eec49e-f0ad-11ea-8511-b02628295df0 rbd://eqiad1-compute/5624cd3e-eeb8-421c-b8e4-7b1056a61550_disk
[..]
    INFO: [backy2.logging] Backy complete.

After the restoration:

root@cloudcontrol1004:~# openstack server start 5624cd3e-eeb8-421c-b8e4-7b1056a61550
aborrero@ceph-restore-practice:~$ cat /proofoflife.txt 
You did it!  This is the VM that you were trying to restore.

This procedure works just fine. I guess is also a prominent candidate for automation (script) if ever detect we use this often.

Thanks @Andrew and @Bstorm the docs are really nice!

That's @aborrero! It's nice to know that backups still exist after I've ignored the system for a few weeks.