Page MenuHomePhabricator

Openstack Glance: add ceph backend
Closed, ResolvedPublic

Description

It's not clear that there's a straightforward path for moving existing base images from the filesystem backend to ceph, but we should at least start using ceph for new images.

It may be possible to migrate existing images via direct DB hacking; I'll experiment with that once we have Ceph set up as an alternate backend.

Event Timeline

Change 628858 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[labs/private@master] Added snakeoil keydata for ceph glance client

https://gerrit.wikimedia.org/r/628858

Change 628861 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloudcontrol eqiad1: add ceph access for Glance

https://gerrit.wikimedia.org/r/628861

Change 628858 merged by Andrew Bogott:
[labs/private@master] Added snakeoil keydata for ceph glance client

https://gerrit.wikimedia.org/r/628858

Change 628865 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[labs/private@master] Renamed profile::ceph::client::rbd::glance_client_keydata

https://gerrit.wikimedia.org/r/628865

Change 628865 merged by Andrew Bogott:
[labs/private@master] Renamed profile::ceph::client::rbd::glance_client_keydata

https://gerrit.wikimedia.org/r/628865

Change 628869 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[labs/private@master] Further attempts to get this key in the right place

https://gerrit.wikimedia.org/r/628869

Change 628869 merged by Andrew Bogott:
[labs/private@master] Further attempts to get this key in the right place

https://gerrit.wikimedia.org/r/628869

Change 628872 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] OpenStack glance: set the default backend to rbd

https://gerrit.wikimedia.org/r/628872

Change 628861 merged by Andrew Bogott:
[operations/puppet@production] cloudcontrol eqiad1: add ceph client for Glance

https://gerrit.wikimedia.org/r/628861

Change 628872 merged by Andrew Bogott:
[operations/puppet@production] OpenStack glance: set the default backend to rbd

https://gerrit.wikimedia.org/r/628872

Change 628946 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] ceph: add firewall rules for cloudcontroller nodes

https://gerrit.wikimedia.org/r/628946

Change 628946 merged by Andrew Bogott:
[operations/puppet@production] ceph: add firewall rules for cloudcontroller nodes

https://gerrit.wikimedia.org/r/628946

Change 628953 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] OpenStack Glance: fixes to glance-api.conf

https://gerrit.wikimedia.org/r/628953

Change 628953 merged by Andrew Bogott:
[operations/puppet@production] OpenStack Glance: fixes to glance-api.conf

https://gerrit.wikimedia.org/r/628953

New images uploaded to Glance will now be stored using Ceph. Our current version of Glance doesn't handle sparse files very well so the images uploaded wind up being pretty big; the VMs based on those images are properly sparse though.

I'm pretty sure we should leave this here and phase out the locally-stored images over time. I have a system in mind for converting them to rbd (which I will document below) but it seems quite risky to do this, mostly because our existing images are qcow2 and all the glance/ceph docs say to only use raw images. I'm concerned that if we convert existing images from qcow2 to raw we'll break copy-on-write in some interesting way, and if we don't we'll break glance in an interesting way.

Should we want to convert existing images to glance, the steps are:

  1. download the existing image from glance with 'openstack image save <id> > <id>.qcow2
  2. convert downloaded file to raw with qemu-img convert -f qcow2 -O raw ./<id>.qcow2 ./<id>.raw.notsparse
  3. convert to sparse with cp --sparse=always ./<id>.raw.notsparse ./<id>.raw
  4. upload the image to rbd with 'rbd import --name client.eqiad1-glance-images --pool eqiad1-glance-images <id>'
  5. take a snapshot: 'rbd snap create eqiad1-glance-images/<id>@snap'
  6. Modify the glance database to change the location: [glance]> update image_locations set value="rbd://5917e6d9-06a0-4928-827a-f489384975b1/eqiad1-glance-images/<id>/snap" where image_id='<id>'; [glance]> update image_locations set meta_data='{"backend": "rbd"}' where image_id='<id>';

Notes:

  • I don't exactly know what the 5917e6d9-06a0-4928-827a-f489384975b1 is there, I assume it refers to a particular rbd backend
  • note that @snap doesn't work, it must be /snap. This is a hint garnered from https://ceph.io/planet/importing-an-existing-ceph-rbd-image-into-glance/
  • I have no idea if glance will properly clean up these rbd files when the image is deleted; probably not.

I now have a script to move glance images from local storage to Ceph which is in

cloudcontrol1003.wikimedia.org:~andrew/wmcs-image-migrate

I'm still nervous about this breaking copy-on-write. For VMs already on ceph this is largely moot since the c-o-w base comes from only one place. In the case of cold-migrating between cephless hypervisors, c-o-w has always been a bit dicey since our cold-migrate script doesn't ensure that the base image is present on the target.

I was able to break at least one VM during cold-migration (by moving to a hypervisor that had never seen the image before). This is probably unrelated to the image on ceph but since completing the VM migrations to ceph will limit the different scenarios it seems best to just wait on this.

This seems to break cold-migration between VMs, so best to wait and reinvestigate this option after everything has moved to Ceph.

Change 634595 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Glance: make rbd the default store in eqiad1

https://gerrit.wikimedia.org/r/634595

Change 634595 merged by Andrew Bogott:
[operations/puppet@production] Glance: make rbd the default store in eqiad1

https://gerrit.wikimedia.org/r/634595

Change 634754 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmcs admin scripts: added wmcs-imageusage

https://gerrit.wikimedia.org/r/634754

Change 634754 merged by Andrew Bogott:
[operations/puppet@production] wmcs admin scripts: added wmcs-imageusage

https://gerrit.wikimedia.org/r/634754

Change 634837 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] glance-api: replace 'default_store' with the more flexible 'glance_backends'

https://gerrit.wikimedia.org/r/634837

Change 634837 merged by Andrew Bogott:
[operations/puppet@production] glance-api: replace 'default_store' with the more flexible 'glance_backends'

https://gerrit.wikimedia.org/r/634837

Change 634839 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] glance: disable the file backend for glance in eqiad1

https://gerrit.wikimedia.org/r/634839

Change 634839 merged by Andrew Bogott:
[operations/puppet@production] glance: disable the file backend for glance in eqiad1

https://gerrit.wikimedia.org/r/634839

Change 634840 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] glance-api: make active/active in eqiad1

https://gerrit.wikimedia.org/r/634840

Glance on eqiad1 is now running exclusively with Ceph. All used images (and a few unused ones) have been migrated to rbd.

The remaining task here is to add some kind of off-Ceph backup for base images (T265843). We can't delete all the filesystem/imagesync code though because codfw1dev still uses local files for images.

Change 634840 merged by Andrew Bogott:
[operations/puppet@production] glance-api: make active/active in eqiad1

https://gerrit.wikimedia.org/r/634840

Change 645746 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Glance: add a hiera setting for the glance ceph pool

https://gerrit.wikimedia.org/r/645746

Change 645746 merged by Andrew Bogott:
[operations/puppet@production] Glance: add a hiera setting for the glance ceph pool

https://gerrit.wikimedia.org/r/645746