Page MenuHomePhabricator

[toolforge,storage,infra,k8s] Investigate persistent volume support
Open, MediumPublic

Description

Things to answer:

  • Can we hook toolforge to our ceph cluster?
  • What controller to use (rook, longhorn, ...)

We might be able to test both things in parallel if we want, though the performance of any of the controllers would not be production level until we sort out how to plug our ceph cluster under it.

Event Timeline

Restricted Application removed a subscriber: taavi. · View Herald TranscriptJan 23 2025, 12:56 PM

Regarding network connectivity. Assuming our current cloudceph cluster, and the current Toolforge kubernetes deployed in VMs inside Cloud VPS.

  • ensure / review cloud-private support for cloudceph nodes -- for the most part, I think this may be ready? Examples: cloudcephosd1009, cloudcephmon1004
  • decide if we want to connect k8s to cinder, or k8s to ceph directly -- I think we can do any, but there are some nuances
    • for example, we may want to prevent VM <-> ceph traffic from flowing via neutron/cloudgw (edge network)
  • k8s-to-cinder, the CSI pluging uses the Openstack API to manage the lifecycle of a cinder volume, including attachment the volume to the VM the k8s worker is running on. See example.
    • this means ceph volume will end in k8s via the VM via the hypervisor -- thus, bypassing the edge network
  • k8s-to-ceph, per the docs, k8s will access the mons directly -- If we don't short-circuit this traffic, it will flow via the edge network. We need to connect to the mons via cloud-private.
    • either create a neutron l2gw and a dedicated cloud-private IP on each k8s worker node
    • or some dedicated routing in the main neutron virtual router for cloud-private

As of this writing, k8s-to-cinder feels more straight forward, network wise: nothing special to do.