Page MenuHomePhabricator

cinder-backups: figure out automation
Closed, DuplicatePublic

Description

The cinder-backups service provides a powerful API to manage backups for cinder volumes.

Example commands using the openstack CLI, backup an offline volume:

root@cloudcontrol2001-dev:~# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-----------+------+---------------------------------------------------------------+
| ID                                   | Name                                       | Status    | Size | Attached to                                                   |
+--------------------------------------+--------------------------------------------+-----------+------+---------------------------------------------------------------+
| 74bf4553-c92e-4fd5-88ef-33fb789ab07a | tlsvol                                     | available |    3 |                                                               |
| bcde703e-1ad9-40c5-badf-5f5eeae18508 | trove-58ec6fd7-0822-440b-beb5-2581e0edf98f | in-use    |    2 | Attached to 3e2b42b3-7b92-4805-9be2-00b2ab5d349b on /dev/vdb  |
| 468cf670-3f23-483b-9309-2f98d289c5dc | bleh                                       | available |    1 |                                                               |
| 4a4f04b1-7c27-4d30-9446-479390b29526 | ussurivol                                  | available |    3 |                                                               |
| 3c82177d-4272-4d63-bef0-edfa3f4a38a5 |                                            | available |   20 |                                                               |
| fbecb639-216c-4d92-a91f-ace4b87e2b0b | testvolume                                 | available |    8 |                                                               |
+--------------------------------------+--------------------------------------------+-----------+------+---------------------------------------------------------------+
root@cloudcontrol2001-dev:~# openstack volume backup create 468cf670-3f23-483b-9309-2f98d289c5dc --name "test backup"
+-------+--------------------------------------+
| Field | Value                                |
+-------+--------------------------------------+
| id    | dd42d300-e1c4-442c-9c6f-fae352e6df9c |
| name  | test backup                          |
+-------+--------------------------------------+
root@cloudcontrol2001-dev:~# openstack volume backup list
+--------------------------------------+-------------+-------------+----------+------+
| ID                                   | Name        | Description | Status   | Size |
+--------------------------------------+-------------+-------------+----------+------+
| dd42d300-e1c4-442c-9c6f-fae352e6df9c | test backup | None        | creating |    1 |
+--------------------------------------+-------------+-------------+----------+------+
root@cloudcontrol2001-dev:~# openstack volume backup list
+--------------------------------------+-------------+-------------+-----------+------+
| ID                                   | Name        | Description | Status    | Size |
+--------------------------------------+-------------+-------------+-----------+------+
| dd42d300-e1c4-442c-9c6f-fae352e6df9c | test backup | None        | available |    1 |
+--------------------------------------+-------------+-------------+-----------+------+
root@cloudcontrol2001-dev:~# openstack volume backup show dd42d300-e1c4-442c-9c6f-fae352e6df9c
+-----------------------+--------------------------------------------+
| Field                 | Value                                      |
+-----------------------+--------------------------------------------+
| availability_zone     | None                                       |
| container             | dd/42/dd42d300-e1c4-442c-9c6f-fae352e6df9c |
| created_at            | 2021-10-27T10:57:59.000000                 |
| data_timestamp        | 2021-10-27T10:57:59.000000                 |
| description           | None                                       |
| fail_reason           | None                                       |
| has_dependent_backups | False                                      |
| id                    | dd42d300-e1c4-442c-9c6f-fae352e6df9c       |
| is_incremental        | False                                      |
| name                  | test backup                                |
| object_count          | 1                                          |
| size                  | 1                                          |
| snapshot_id           | None                                       |
| status                | available                                  |
| updated_at            | 2021-10-27T10:58:21.000000                 |
| volume_id             | 468cf670-3f23-483b-9309-2f98d289c5dc       |
+-----------------------+--------------------------------------------+

Example commands to create a backup from a volume snapshot:

root@cloudcontrol2001-dev:~# openstack volume snapshot create --volume bcde703e-1ad9-40c5-badf-5f5eeae18508 trove-volume-snapshot --force
+-------------+--------------------------------------+
| Field       | Value                                |
+-------------+--------------------------------------+
| created_at  | 2021-10-27T11:24:17.089371           |
| description | None                                 |
| id          | 0d223da6-b5ff-4cb2-a808-3fa089dd49dc |
| name        | trove-volume-snapshot                |
| properties  |                                      |
| size        | 2                                    |
| status      | creating                             |
| updated_at  | None                                 |
| volume_id   | bcde703e-1ad9-40c5-badf-5f5eeae18508 |
+-------------+--------------------------------------+
root@cloudcontrol2001-dev:~# openstack volume snapshot list
+--------------------------------------+-----------------------+-------------+-----------+------+
| ID                                   | Name                  | Description | Status    | Size |
+--------------------------------------+-----------------------+-------------+-----------+------+
| 0d223da6-b5ff-4cb2-a808-3fa089dd49dc | trove-volume-snapshot | None        | available |    2 |
+--------------------------------------+-----------------------+-------------+-----------+------+
root@cloudcontrol2001-dev:~# openstack volume backup create --snapshot 0d223da6-b5ff-4cb2-a808-3fa089dd49dc --name "trove-volume-snapshot-backup" bcde703e-1ad9-40c5-badf-5f5eeae18508
+-------+--------------------------------------+
| Field | Value                                |
+-------+--------------------------------------+
| id    | 10f85d0a-c403-4352-bcda-b3d4176d2218 |
| name  | trove-volume-snapshot-backup         |
+-------+--------------------------------------+
root@cloudcontrol2001-dev:~# openstack volume backup list
+--------------------------------------+------------------------------+-------------+-----------+------+
| ID                                   | Name                         | Description | Status    | Size |
+--------------------------------------+------------------------------+-------------+-----------+------+
| 10f85d0a-c403-4352-bcda-b3d4176d2218 | trove-volume-snapshot-backup | None        | available |    2 |
| 74090b3d-f117-4d05-9e08-826d1b3c7e89 | test backup                  | None        | available |   20 |
| dd42d300-e1c4-442c-9c6f-fae352e6df9c | test backup                  | None        | available |    1 |
+--------------------------------------+------------------------------+-------------+-----------+------+

Example commands to restore a backup:

root@cloudcontrol2001-dev:~# openstack volume backup show dd42d300-e1c4-442c-9c6f-fae352e6df9c
+-----------------------+--------------------------------------------+
| Field                 | Value                                      |
+-----------------------+--------------------------------------------+
| availability_zone     | None                                       |
| container             | dd/42/dd42d300-e1c4-442c-9c6f-fae352e6df9c |
| created_at            | 2021-10-27T10:57:59.000000                 |
| data_timestamp        | 2021-10-27T10:57:59.000000                 |
| description           | None                                       |
| fail_reason           | None                                       |
| has_dependent_backups | False                                      |
| id                    | dd42d300-e1c4-442c-9c6f-fae352e6df9c       |
| is_incremental        | False                                      |
| name                  | test backup                                |
| object_count          | 1                                          |
| size                  | 1                                          |
| snapshot_id           | None                                       |
| status                | available                                  |
| updated_at            | 2021-10-27T10:58:21.000000                 |
| volume_id             | 468cf670-3f23-483b-9309-2f98d289c5dc       |
+-----------------------+--------------------------------------------+
root@cloudcontrol2001-dev:~# openstack volume backup restore dd42d300-e1c4-442c-9c6f-fae352e6df9c 468cf670-3f23-483b-9309-2f98d289c5dc
+-------------+--------------------------------------+
| Field       | Value                                |
+-------------+--------------------------------------+
| backup_id   | dd42d300-e1c4-442c-9c6f-fae352e6df9c |
| volume_id   | 468cf670-3f23-483b-9309-2f98d289c5dc |
| volume_name | bleh                                 |
+-------------+--------------------------------------+

PROPOSAL

Create a script wmcs-cinder-volume-backup that does the following:

  • takes a cinder volume id as argument
  • creates a snapshot of the volume
  • creates a backup of the snapshot

Create a script wmcs-cinder-volume-restore that does the following:

  • takes both a cinder volume id as argument, and a backup id
  • restores the backup to the volume (or to a snapshot?) TODO: figure this out

Create a daemon wmcs-cinder-backup-manager that does the following:

  • reads a yaml config file. This includes a list of volumes and potentially max quota/policy over copies to store
  • does proper quota validation
  • runs wmcs-cinder-volume-backup as required

The daemon can be run on one of the cloudcontrol nodes per deployment. The daemon yaml config file can be managed using puppet.

NOTE: additional logic will be required to support trove volumes, since those are dynamically managed by trove itself. The proposal above mostly works assuming we only want backups for NFS volumes (i.e, volumes created by hand by us)

Event Timeline

I'd like to dig a bit deeper on the snapshots part, specifically on how does it reflect on ceph side (rbd), and how are they cleaned up (we had issues in the past where we created too many rbd snapshots and ended up making the cluster underperform considerably).

I'd like to dig a bit deeper on the snapshots part, specifically on how does it reflect on ceph side (rbd), and how are they cleaned up (we had issues in the past where we created too many rbd snapshots and ended up making the cluster underperform considerably).

fair!

This is currently held up because I'm seeing periodic failed backups when trying to do chains of incremental backups. I suspect it's due to failed coordination between the two backend agents.

Change 745917 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Add initial script to manage/automate cinder backups

https://gerrit.wikimedia.org/r/745917

Change 745926 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Add simple script to backup cinder volumes according to yaml config

https://gerrit.wikimedia.org/r/745926

Change 747937 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Cinder: add backup job timer to one cloudcontrol in each cluster

https://gerrit.wikimedia.org/r/747937

Change 745917 merged by Andrew Bogott:

[operations/puppet@production] Add initial script to manage/automate cinder backups

https://gerrit.wikimedia.org/r/745917

I'm about to merge a timer that does daily backups in codfw1dev; we'll see how it holds up over the next few days.

I have NOT done anything about backup invalidation or retention time. I'm actually not 100% sure how to handle that with incremental backups; we can't just throw away the oldest because that would invalidate all the incrementals that depend on it...

I'll try to read up on what good algorithms are for this. Probably something like 'do a new, full backup every week, and discard old backups weekly, a whole week's worth at a time'?

Change 745926 merged by Andrew Bogott:

[operations/puppet@production] Add simple script to backup cinder volumes according to yaml config

https://gerrit.wikimedia.org/r/745926

Change 747937 merged by Andrew Bogott:

[operations/puppet@production] Cinder: add backup job timer to one cloudcontrol in each cluster

https://gerrit.wikimedia.org/r/747937

Change 748140 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cinder backups: dump backup config to yaml

https://gerrit.wikimedia.org/r/748140

Change 748140 merged by Andrew Bogott:

[operations/puppet@production] cinder backups: dump backup config to yaml

https://gerrit.wikimedia.org/r/748140

Change 748149 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] wmcs-cinder-volume-backup: get openstack creds from novaadmin.yaml

https://gerrit.wikimedia.org/r/748149

Change 748149 merged by Andrew Bogott:

[operations/puppet@production] wmcs-cinder-volume-backup: get openstack creds from novaadmin.yaml

https://gerrit.wikimedia.org/r/748149

Change 748158 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] wmcs-cinder-volume-backup: support user-requested full backups

https://gerrit.wikimedia.org/r/748158

Change 748158 merged by Andrew Bogott:

[operations/puppet@production] wmcs-cinder-volume-backup: support user-requested full backups

https://gerrit.wikimedia.org/r/748158

Change 748206 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] wmcs-cinder-volume-backup: Add --purge-older-than option

https://gerrit.wikimedia.org/r/748206

Change 748206 merged by Andrew Bogott:

[operations/puppet@production] wmcs-cinder-volume-backup: Add --purge-older-than option

https://gerrit.wikimedia.org/r/748206

Change 748211 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] wmcs-cinder-backup-manager: Support periodic full backups and purges

https://gerrit.wikimedia.org/r/748211

Change 748211 merged by Andrew Bogott:

[operations/puppet@production] wmcs-cinder-backup-manager: Support periodic full backups and purges

https://gerrit.wikimedia.org/r/748211

Change 748768 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] wmcs-cinder-backup-manager.py: get openstack creds from novaadmin.yaml

https://gerrit.wikimedia.org/r/748768

Change 748768 merged by Andrew Bogott:

[operations/puppet@production] wmcs-cinder-backup-manager.py: get openstack creds from novaadmin.yaml

https://gerrit.wikimedia.org/r/748768

Change 763953 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] wmcs-cinder-backup-manager.py: offset maps full upgrade

https://gerrit.wikimedia.org/r/763953

Change 763953 merged by Andrew Bogott:

[operations/puppet@production] wmcs-cinder-backup-manager.py: offset maps full upgrade

https://gerrit.wikimedia.org/r/763953

Change 829253 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cinder: set backup_use_same_host=True

https://gerrit.wikimedia.org/r/829253

Change 829256 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Switch on a second cinder backup host for codfw1dev

https://gerrit.wikimedia.org/r/829256

Change 829253 merged by Andrew Bogott:

[operations/puppet@production] cinder: set backup_use_same_host=True

https://gerrit.wikimedia.org/r/829253

Change 829256 merged by Andrew Bogott:

[operations/puppet@production] Switch on a second cinder backup host for codfw1dev

https://gerrit.wikimedia.org/r/829256