Page MenuHomePhabricator

Research CephFS as a replacement for NFS
Open, MediumPublic

Description

Research CephFS as a replacement for NFS:

  • Feature comparison
  • Investigate necessary hardware
  • Compatibility with Jessie/Stretch clients
  • Integration with OpenStack storage management
  • Integration with Kubernetes storage management
  • Performance comparison (small vs large I/O requests, maximum throughput, latency, etc)
  • Backup strategy

Task from WMCS 2018 offsite meetings.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 21 2018, 12:34 PM
GTirloni triaged this task as Medium priority.Oct 21 2018, 12:34 PM
GTirloni removed a subscriber: GTirloni.Mar 21 2019, 9:11 PM
Bstorm added a subscriber: Bstorm.Apr 8 2019, 5:09 PM

Some notes that also will apply to T90364:

  • Ceph is a very high-latency system without lots of grooming and love when used as anything but a straight-up object store like Swift. With proper tuning it can be somewhat faster than NFSv4 in sync mode (which we use).
  • 10G network on all nodes and clients should be viewed as a requirement except where we basically don't care at all about speed
  • If OSD nodes are large with lots of disks, a failure and rebuild could collapse the system by overstraining the OSDs and creating instability. Smaller, more numerous nodes allow for more resiliency and higher availability and performance.
  • The better the disk, the faster the processor needs to be...and single socket can outperform dual socket motherboards for ceph with the same processors--multiple cores good.
  • Ceph doesn't do comprehensive testing and development on Debian, though they do package for it with a basic install and check test. They also recommend upgrading stock Debian kernels. They do full comprehensive support on CentOS and Ubuntu. No mimic packages are available for Debian until Buster because of needing gcc8.
  • IO throughput requirement testing needs to be done so that we can tune things. I'm digging around in prometheus to find good metrics to watch and compare.
  • Reviewing the network architecture around Ceph is recommended to avoid collapsing the system because of network changes and unrecommended config
Bstorm added a comment.Apr 8 2019, 5:12 PM

Also: because of weaknesses in the old system, we probably will want to start with Luminous or Mimic--especially since no other releases are actually supported. Bluestore-as-default is one of the biggest benefits.

bd808 moved this task from Backlog to Shared Storage on the Data-Services board.May 30 2019, 7:03 PM