Page MenuHomePhabricator

Ceph Proof of Concept Build and Testing
Closed, ResolvedPublic0 Estimated Story Points

Description

This task is to start and track work on building out the Ceph proof of concept once the hardware arrives.

Deliverables will include:

  • Determine feasibility of Rook installation method
  • Try straight puppet installation method if Rook not feasible
  • A test of cloudvirt block storage connection and stress test.

[] CephFS connection and stress test from VMs Not using CephFS in phase one, only RBD images
[] RGW test for docker images (needs some parameters around it) Not using docker images

  • A capacity plan with at least some rough water marks

Event Timeline

Bstorm triaged this task as Medium priority.Jun 7 2019, 6:33 PM
Bstorm created this task.

For puppet installation and config of Ceph, the following packages are needed to be copied from the Ceph apt repo to use Ceph Luminous (the latest supported package available for Debian Stretch), based on local testing with that OS:
ceph
ceph-base
ceph-base-dbg
ceph-common
ceph-common-dbg
ceph-deploy
ceph-fuse
ceph-fuse-dbg
ceph-mds
ceph-mds-dbg
ceph-mgr
ceph-mgr-dbg
ceph-mon
ceph-mon-dbg
ceph-osd
ceph-osd-dbg
ceph-resource-agents
ceph-test
ceph-test-dbg
libcephfs-dev
libcephfs-java
libcephfs-jni
libcephfs2
libcephfs2-dbg
librados-dev
librados2
librados2-dbg
libradosstriper-dev
libradosstriper1
libradosstriper1-dbg
librbd-dev
librbd1
librbd1-dbg
librgw-dev
librgw2
librgw2-dbg
python-ceph
python-cephfs
python-cephfs-dbg
python-rados
python-rados-dbg
python-rbd
python-rbd-dbg
python-rgw
python-rgw-dbg
python3-ceph-argparse
python3-cephfs
python3-cephfs-dbg
python3-rados
python3-rados-dbg
python3-rbd
python3-rbd-dbg
python3-rgw
python3-rgw-dbg
rados-objclass-dev
radosgw
radosgw-dbg
rbd-fuse
rbd-fuse-dbg
rbd-mirror
rbd-mirror-dbg
rbd-nbd
rbd-nbd-dbg

Note: I just confirmed locally that CephFS cannot set extended attributes (specifically the immutable attribute) in Luminous.
The feature has a tracker here: http://tracker.ceph.com/issues/10679
As this was last updated 3 years ago, I expect them to implement that as soon as one of us writes the patch basically 😛

Sage and Greg (main core devs) were quite open to the idea, but it doesn't seem to be a huge priority.

All that said, instead of CephFS, RBD can be exported as iSCSI, which, if set up with appropriate multipathing and a clusterFS (or using GaneshaNFS so that userland locking takes place), we could build an actually HA Linux NFS server with Ceph backing it in the exact same way as it would be for cloudvirts, with all the same quirks. That would mean that NFS could still do immutable bits because, well, it can do that. While there may be useful cases where CephFS makes sense, we may get much more use out of Ceph by simply using it for block devices in nearly every case. That sort of consistent use might make managing a ceph cluster "easier" as well, even if it might complicate NFS a bit (but not really much more than it already is). The topic of shipping ceph rbds to iscsi targets is not a small one, so I'm putting that away for now (but it may be worth testing locally while kicking this around).

Change 556308 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Add some comments about cloudvirts that are set aside for ceph testing.

https://gerrit.wikimedia.org/r/556308

@JHedden, for hypervisor testing you can use cloudvirt1022 and/or cloudvirt2003-dev. I've added some comments to the pool hiera indicating that use.

Change 556308 merged by Andrew Bogott:
[operations/puppet@production] Add some comments about cloudvirts that are set aside for ceph testing.

https://gerrit.wikimedia.org/r/556308

Mentioned in SAL (#wikimedia-cloud) [2020-01-03T19:36:45Z] <jeh> create private flavor m1.small-ceph for testing IO limits T225320

Mentioned in SAL (#wikimedia-cloud) [2020-01-13T14:47:54Z] <jeh> finished testing ceph on cloudvirt1022, revert to local disk puppet role and recreate canary instance T225320

Change 564048 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] openstack: add cloudvirt1022 back into scheduler pool

https://gerrit.wikimedia.org/r/564048

Change 564048 merged by Jhedden:
[operations/puppet@production] openstack: add cloudvirt1022 back into scheduler pool

https://gerrit.wikimedia.org/r/564048

Change 570947 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] ceph: configure jumbo frames on OSD interfaces

https://gerrit.wikimedia.org/r/570947

Change 570947 merged by Jhedden:
[operations/puppet@production] ceph: configure jumbo frames on OSD interfaces

https://gerrit.wikimedia.org/r/570947

Change 570954 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] ceph: configure osd cluster network on boot

https://gerrit.wikimedia.org/r/570954

Change 570954 merged by Jhedden:
[operations/puppet@production] ceph: configure osd cluster network on boot

https://gerrit.wikimedia.org/r/570954

Mentioned in SAL (#wikimedia-cloud) [2020-03-04T20:02:00Z] <jeh> add new ceph host aggregate T225320

Change 596762 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] nova-compute: set ceph nodes to use CPU features available on all cloudvirts

https://gerrit.wikimedia.org/r/596762

Change 596762 merged by Andrew Bogott:
[operations/puppet@production] nova-compute: set ceph nodes to use CPU features available on all cloudvirts

https://gerrit.wikimedia.org/r/596762

bd808 assigned this task to JHedden.