We plan to store cloud NFS data on cinder volumes. For that we need to figure out how to backup the volumes out of ceph.
Description
Details
Event Timeline
Change 730769 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] openstack: cinder backups: use per-deployment rabbit pass
Change 730769 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] openstack: cinder backups: use per-deployment rabbit pass
Change 730771 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] hieradata: cloudbackup2002: fix typo in LVM volue group name
Change 730771 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] hieradata: cloudbackup2002: fix typo in LVM volue group name
Change 730776 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] openstack: cinder backups: create directory for mount
Change 730776 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] openstack: cinder backups: create directory for mount
Mentioned in SAL (#wikimedia-cloud) [2021-10-14T12:28:37Z] <arturo> [codfw1dev] add DB grants for cloudbackup2002.codfw.wmnet IP address to the cinder DB (T292546)
Change 730779 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] openstack: cinder: allow backup API actions
Change 730779 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] openstack: cinder: allow backup API actions
Change 730782 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] openstack: galera: allow DB access to cinder-backup nodes
Change 730782 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] openstack: galera: allow DB access to cinder-backup nodes
Change 730784 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] openstack: cinder.conf: specify lock path
Change 730784 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] openstack: cinder.conf: specify lock path
Change 730829 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] openstack: cinder backups: introduce ceph client config
Change 730829 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] openstack: cinder backups: introduce ceph client config
Change 731370 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] hieradata: openstack: cinder-backups: fix ceph keyring file name
Change 731370 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] hieradata: openstack: cinder-backups: fix ceph keyring file name
Change 731375 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] hieradata: openstack: cinder-backups: fix permissions of ceph keyring file
Change 731375 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] hieradata: openstack: cinder-backups: fix permissions of ceph keyring file
Current status:
A bit of hiera mess in puppet prevents the cinder-backup service (running on cloudbackup2002.codfw.wmnet) from getting the right ceph credentials (as can be seen on /var/log/cinder/cinder-backup.log when triggering a backup action on cloudcontrol2001-dev.wikimedia.org)
So I had a hunch today. We haven't fully tested yet that cinder-backups can indeed fetch information from the ceph cluster (because we found T293752: cloud ceph: refactor rbd client puppet profiles and blocked on it).
I decided to workaround this today to verify if the cinder-backups does work with ceph as intended or not.
Surprise: it doesn't.
It shows this log line:
2021-10-21 12:32:36.677 22312 DEBUG os_brick.initiator.linuxrbd [req-46594b0e-9032-4497-836e-016d97a44a40 novaadmin admin - - -] opening connection to ceph cluster (timeout=-1). connect /usr/lib/python3/dist-packages/os_brick/initiator/linuxrbd.py:70
There is some traffic going on between cloudbackup2002 and the mons:
aborrero@cloudbackup2002:~ $ sudo tcpdump -i any tcp port 3300 or tcp port 6789 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes 12:28:09.980792 IP cloudbackup2002.codfw.wmnet.34228 > cloudcephmon2004-dev.codfw.wmnet.6789: Flags [S], seq 994059222, win 42340, options [mss 1460,sackOK,TS val 2245731376 ecr 0,nop,wscale 9], length 0 12:28:09.980852 IP cloudbackup2002.codfw.wmnet.37236 > cloudcephmon2003-dev.codfw.wmnet.6789: Flags [S], seq 768127409, win 42340, options [mss 1460,sackOK,TS val 2666147774 ecr 0,nop,wscale 9], length 0 12:28:09.980863 IP cloudbackup2002.codfw.wmnet.50994 > cloudcephmon2002-dev.codfw.wmnet.6789: Flags [S], seq 4087198713, win 42340, options [mss 1460,sackOK,TS val 3124464738 ecr 0,nop,wscale 9], length 0 12:28:09.981021 IP cloudcephmon2004-dev.codfw.wmnet.6789 > cloudbackup2002.codfw.wmnet.34228: Flags [S.], seq 1706716108, ack 994059223, win 43440, options [mss 1460,sackOK,TS val 958699844 ecr 2245731376,nop,wscale 9], length 0 [..]
I checked logs on the mons, there is no specific information about what's going on:
aborrero@cloudcephmon2002-dev:~ $ sudo tail /var/log/ceph/ceph.audit.log 2021-10-21T12:53:03.834986+0000 mon.cloudcephmon2003-dev (mon.1) 512964 : audit [DBG] from='client.? 208.80.153.75:0/3066437256' entity='client.codfw1dev-cinder' cmd=[{,",p,r,e,f,i,x,",:,",d,f,",,, ,",f,o,r,m,a,t,",:,",j,s,o,n,",}]: dispatch 2021-10-21T12:53:03.836307+0000 mon.cloudcephmon2003-dev (mon.1) 512965 : audit [DBG] from='client.? 208.80.153.75:0/3066437256' entity='client.codfw1dev-cinder' cmd=[{,",p,r,e,f,i,x,",:,",o,s,d, ,p,o,o,l, ,g,e,t,-,q,u,o,t,a,",,, ,",p,o,o,l,",:, ,",c,o,d,f,w,1,d,e,v,-,c,i,n,d,e,r,",,, ,",f,o,r,m,a,t,",:,",j,s,o,n,",}]: dispatch 2021-10-21T12:53:04.610579+0000 mon.cloudcephmon2004-dev (mon.2) 1222350 : audit [INF] from='mgr.23690785 10.192.20.7:0/763' entity='mgr.cloudcephmon2002-dev' cmd=[{"prefix":"config rm","who":"mgr","name":"mgr/rbd_support/cloudcephmon2002-dev/trash_purge_schedule"}]: dispatch 2021-10-21T12:53:04.611581+0000 mon.cloudcephmon2002-dev (mon.0) 708620 : audit [INF] from='mgr.23690785 ' entity='mgr.cloudcephmon2002-dev' cmd=[{"prefix":"config rm","who":"mgr","name":"mgr/rbd_support/cloudcephmon2002-dev/trash_purge_schedule"}]: dispatch
I see however this weird line cmd=[{,",p,r,e,f,i,x,",:,",o,s,d, ,p,o,o,l, ,g,e,t,-,q,u,o,t,a,",,, ,",p,o,o,l,",:, ,",c,o,d,f,w,1,d,e,v,-,c,i,n,d,e,r,",,, ,",f,o,r,m,a,t,",:,",j,s,o,n,",}]: which seems like a bad serialization somewhere?
Additional action items:
- have a sensible timeout for the rbd client connection. Not sure where that is set though (apparently not /etc/cinder/cinder.conf)
I see however this weird line cmd=[{,",p,r,e,f,i,x,",:,",o,s,d, ,p,o,o,l, ,g,e,t,-,q,u,o,t,a,",,, ,",p,o,o,l,",:, ,",c,o,d,f,w,1,d,e,v,-,c,i,n,d,e,r,",,, ,",f,o,r,m,a,t,",:,",j,s,o,n,",}]: which seems like a bad serialization somewhere?
This seems to me like a config option that was expected to be an array of strings, having set as a string xd
Change 734690 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/puppet@production] ceph::osd: add cinder backup hosts to ferm
Change 734690 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] ceph::osd: add cinder backup hosts to ferm
This seems to me like a config option that was expected to be an array of strings, having set as a string xd
For the record, it was a missing firewall rule on the osd side (really confusing error messages from ceph cli :S)
Update:
- we were able to fix a hiera issue that was preventing us from testing the right ceph keydata for cinder-backups ahead of the ceph refactor https://gerrit.wikimedia.org/r/c/operations/puppet/+/734937 thanks @jbond for the assistance
- with that change in place, the cinder-backup service now works fine. I was able to backup several volumes, and restore them
- I think now we've finally validated the basic functionality of the cinder-backups API and can safely proceed with the next steps, which are:
- the ceph refactor T293752: cloud ceph: refactor rbd client puppet profiles
- enable cinder-backup in the eqiad1 deployment
- figure out how to instrument / automate the backup logic
Extra notes:
- It was nice to discover that the chunking algorithm that cinder-backup uses takes into account empty blocks, this means the storage on the backup side will be more efficiently managed (ie, backing up a 20G volume of empty data takes very little storage space, not 20G)
- the cinder-backup service is also designed for horizontal scalability. We could just add more cloudbackup servers and cinder-backup will just know how to work with them to split load/storage.
Example session with the new CLI:
root@cloudcontrol2001-dev:~# openstack volume list --all-projects +--------------------------------------+--------------------------------------------+-----------+------+---------------------------------------------------------------+ | ID | Name | Status | Size | Attached to | +--------------------------------------+--------------------------------------------+-----------+------+---------------------------------------------------------------+ | 74bf4553-c92e-4fd5-88ef-33fb789ab07a | tlsvol | available | 3 | | | bcde703e-1ad9-40c5-badf-5f5eeae18508 | trove-58ec6fd7-0822-440b-beb5-2581e0edf98f | in-use | 2 | Attached to 3e2b42b3-7b92-4805-9be2-00b2ab5d349b on /dev/vdb | | 468cf670-3f23-483b-9309-2f98d289c5dc | bleh | available | 1 | | | 4a4f04b1-7c27-4d30-9446-479390b29526 | ussurivol | available | 3 | | | 3c82177d-4272-4d63-bef0-edfa3f4a38a5 | | available | 20 | | | fbecb639-216c-4d92-a91f-ace4b87e2b0b | testvolume | available | 8 | | +--------------------------------------+--------------------------------------------+-----------+------+---------------------------------------------------------------+ root@cloudcontrol2001-dev:~# openstack volume backup create 468cf670-3f23-483b-9309-2f98d289c5dc --name "test backup" +-------+--------------------------------------+ | Field | Value | +-------+--------------------------------------+ | id | dd42d300-e1c4-442c-9c6f-fae352e6df9c | | name | test backup | +-------+--------------------------------------+ root@cloudcontrol2001-dev:~# openstack volume backup list +--------------------------------------+-------------+-------------+----------+------+ | ID | Name | Description | Status | Size | +--------------------------------------+-------------+-------------+----------+------+ | dd42d300-e1c4-442c-9c6f-fae352e6df9c | test backup | None | creating | 1 | +--------------------------------------+-------------+-------------+----------+------+ root@cloudcontrol2001-dev:~# openstack volume backup list +--------------------------------------+-------------+-------------+-----------+------+ | ID | Name | Description | Status | Size | +--------------------------------------+-------------+-------------+-----------+------+ | dd42d300-e1c4-442c-9c6f-fae352e6df9c | test backup | None | available | 1 | +--------------------------------------+-------------+-------------+-----------+------+ root@cloudcontrol2001-dev:~# openstack volume backup show dd42d300-e1c4-442c-9c6f-fae352e6df9c +-----------------------+--------------------------------------------+ | Field | Value | +-----------------------+--------------------------------------------+ | availability_zone | None | | container | dd/42/dd42d300-e1c4-442c-9c6f-fae352e6df9c | | created_at | 2021-10-27T10:57:59.000000 | | data_timestamp | 2021-10-27T10:57:59.000000 | | description | None | | fail_reason | None | | has_dependent_backups | False | | id | dd42d300-e1c4-442c-9c6f-fae352e6df9c | | is_incremental | False | | name | test backup | | object_count | 1 | | size | 1 | | snapshot_id | None | | status | available | | updated_at | 2021-10-27T10:58:21.000000 | | volume_id | 468cf670-3f23-483b-9309-2f98d289c5dc | +-----------------------+--------------------------------------------+
Change 740551 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] cloud: cinder-backups: use main ceph cinder keyring
Change 740554 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] cinder: fix config template and don't reuse 'ceph_pool' that much
Change 740562 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[labs/private@master] ceph: codfw: refresh entry name for codfw1dev-cinder-backups
Change 740562 merged by Arturo Borrero Gonzalez:
[labs/private@master] ceph: codfw: refresh entry name for codfw1dev-cinder-backups
Change 740564 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[labs/private@master] codfw1dev: backups: refresh entry for ceph keyring
Change 740564 merged by Arturo Borrero Gonzalez:
[labs/private@master] codfw1dev: backups: refresh entry for ceph keyring
Change 740554 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cinder: fix config template and don't reuse 'ceph_pool' that much
Change 740579 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] openstack: codfw1dev: deploy general cinder keyring in cinder-backups nodes
Change 740579 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] openstack: codfw1dev: deploy general cinder keyring in cinder-backups nodes
Change 740827 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] cloud: codfw1dev: fix keyring owner/group for cinder-backups
Change 740829 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[labs/private@master] hiera: cloud: refresh keyname for codfw1dev cinder backups
Change 740829 merged by Arturo Borrero Gonzalez:
[labs/private@master] hiera: cloud: refresh keyname for codfw1dev cinder backups
Change 740827 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloud: codfw1dev: fix keyring owner/group for cinder-backups
Change 742273 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] cinder.conf: Tune settings for the backup agent.
Change 742273 merged by Andrew Bogott:
[operations/puppet@production] cinder.conf: Tune settings for the backup agent.
Change 740551 abandoned by Arturo Borrero Gonzalez:
[operations/puppet@production] cloud: cinder-backups: use main ceph cinder keyring
Reason:
a different patch was merged, see ed5658a51946148376ec19a6474d5e972bb34167
Logged upstream bug:
https://bugs.launchpad.net/cinder/+bug/1952804
I'm not sure if this is a deal-breaker or not; even with that bug fixed there will still be a race which causes a stuck job if a backup backend goes down unexpectedly.
Here is the other serious upstream bug I've been seeing:
https://bugs.launchpad.net/cinder/+bug/1952805
That means that we can use incremental backups or multiple backend nodes, but not both.
Thanks for identifying the problem and reporting it upstream.
My thought: I introduced puppet support for multi-node cinder-backup nodes because that's the way to use all our current storage dedicated to backups (remember, 2*cloudbackup servers in codfw with 200TB storage each).
This is to say: I don't see any problem in using just 1 cinder-backup node until this upstream bug is fixed. This bug shouldn't be a blocker.
All of our short term backup storage requirements for the NFS migration can be covered with a single 200TB cinder-backup node. Example: cloudbackup2002, using 10TB out of 214TB (204TB free).
Again, thanks for identifying the issue and reporting it upstream!
What you described in the upstream ticket seems like exactly what I've been experiencing. I've seen backups fail right after the cinder-backup agent started (after a config change or whatever). So perhaps is just a matter of not being anxious, and don't schedule backups until the cinder-backup service has been up for, lets say 5 minutes.
What concerns me more is that cinder seems to leave the backup in 'creating' state forever. Ideally it would declare it 'failed backup' Did you ever see cinder declaring it 'failed'?
What concerns me more is that cinder seems to leave the backup in 'creating' state forever. Ideally it would declare it 'failed backup' Did you ever see cinder declaring it 'failed'?
Yes, usually! When the unavailable service comes up and gets oriented the stuck backup usually changes to an error state. But not always :/
I think we should switch all of our testing to a single-backend model and see if it mostly stops breaking.
hey @Andrew I just noticed that I didn't look yet into the root of the problem I commented here: https://phabricator.wikimedia.org/T292546#7447927
- some weird rbd command serialization problem
- connectivity issues cinder-backups <-> ceph
I suspect there could be problems related to different ceph client lib versions, or rbd proto v1 vs proto v2, stuff like that. But honestly, didn't have the chance to dig deeper, in case you want something to investigate.
I looked at this more today. A lot of the suddenly-failing jobs seem to be poorly-surfaced OOM issues (I'm testing on cloudbackup1001-dev which only has 4Gb of RAM). When I change the buffer size to be much smaller I get many fewer failures although, unfortunately, I'm still seeing occasional jobs stuck in 'creating' forever.
backup_file_size = 3276800
Change 744821 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] cinder-backup: generate backup_file_size relative to available RAM
Change 744821 merged by Andrew Bogott:
[operations/puppet@production] cinder-backup: generate backup_file_size relative to available RAM
Change 745765 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] ceph: auth: drop cinder-backup keyrings
Change 745765 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] ceph: auth: drop cinder-backup keyrings
Change 755057 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] Add cinder-backup role/profile for eqiad1, use on cloudbackup2002
Change 755057 merged by Andrew Bogott:
[operations/puppet@production] Add cinder-backup role/profile for eqiad1, use on cloudbackup2002
Change 755753 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] Define profile::openstack::eqiad1::cinder::backup::nodes
Change 755753 merged by Andrew Bogott:
[operations/puppet@production] Define profile::openstack::eqiad1::cinder::backup::nodes
Change 755759 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] Provide cinder backup node list to rabbitmq in eqiad1
Change 755759 merged by Andrew Bogott:
[operations/puppet@production] Provide cinder backup node list to rabbitmq in eqiad1
Change 755788 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] ceph: list cloudbackup2002 as a cinder backup node
Change 755788 merged by Andrew Bogott:
[operations/puppet@production] ceph: list cloudbackup2002 as a cinder backup node
Change 763310 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] wmcs-cinder-backup-manager.py: increase total backup timeout
Change 763310 merged by Andrew Bogott:
[operations/puppet@production] wmcs-cinder-backup-manager.py: increase total backup timeout
This is mostly working now -- all modest-sized volumes are getting backed up fine.
I have one outlier, the 8Tb 'maps' volume in the 'maps' project never seems to complete. I've increased the timeout to 18 hours with no success.
root@cloudbackup2002:/usr/lib/python3/dist-packages/cinder/backup# iperf -c cloudcephosd1024.eqiad.wmnet -p 7100 ------------------------------------------------------------ Client connecting to cloudcephosd1024.eqiad.wmnet, TCP port 7100 TCP window size: 325 KByte (default) ------------------------------------------------------------ [ 3] local 10.192.32.186 port 40620 connected with 10.64.20.20 port 7100 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 2.38 GBytes 2.04 Gbits/sec
That should be barely enough bandwidth... at 2Gbits/second we should be able to transfer an 8TB file in 32000 seconds or around 9 hours. That sounds excessive but if we're only doing occasional full backups this is all somewhat possible if we can optimize a bit more. On the other hand, I note that that rate is awfully close to a round number which has me wondering if there's a throttle someplace we could adjust.
The tools nfs mount is also about 8Tb so if we get maps working we should be able to get tools working too as long as they don't try to do their full backups on the same day.
This is actually working now. The maps volume is handled as an edge case (incremental backups don't really function for volumes that large) but we're getting periodic backups at least.
There's ongoing upstream work to tidy up this feature but our deployment is in OK shape now.