Page MenuHomePhabricator

Enable the Container Storage Interface (CSI) and the Ceph CSI plugin on dse-k8s cluster
Open, HighPublic

Description

Update April 2024

We now need this functionality to support T362788: Migrate Airflow to the dse-k8s cluster so it seems best to use this ticket to track the remaining work to get the persistent volumes working in dse-k8s.

User Story

As a Wikimedia engineer, I want to be able to deploy a stateful application using the Persistent Volume Claim Kubernetes object so that I can ensure the application's data remains persistent even if the pod or container running the application is deleted or recreated.

Implementation Plan

We will be following the guidance outlined here: https://docs.ceph.com/en/reef/rbd/rbd-kubernetes

The required steps are:

Acceptance Criteria

  • The engineer should be able to deploy a stateful application using the PersistentVolumeClaim Kubernetes object.
  • The PersistentVolumeClaim should be serviced by the Ceph cluster
  • The application's data should remain persistent even if the pod or container running the application is deleted or recreated.

Event Timeline

BTullis renamed this task from DSE Experiment - User Story 3 (Make Block Storage Available) to Support PersistentVolumeClaim objects on dse-k8s cluster.Jul 18 2023, 10:29 AM
BTullis triaged this task as Low priority.
BTullis raised the priority of this task from Low to High.
BTullis updated the task description. (Show Details)
BTullis removed a subscriber: EChetty.

I configured an initial set of pools that use erasure coding in T326945#9045272

btullis@cephosd1005:~$ sudo ceph osd pool ls
.mgr
rbd-metadata-ssd
rbd-metadata-hdd
rbd-data-ssd
rbd-data-hdd
btullis@cephosd1005:~$ sudo ceph osd pool ls detail
pool 2 '.mgr' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 23676 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 37.50
pool 3 'rbd-metadata-ssd' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 29336 lfor 0/29336/29334 flags hashpspool stripe_width 0 application rbd read_balance_score 5.00
pool 4 'rbd-metadata-hdd' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 29537 lfor 0/29537/29535 flags hashpspool stripe_width 0 application rbd read_balance_score 3.73
pool 5 'rbd-data-ssd' erasure profile ec32-ssd size 5 min_size 4 crush_rule 3 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 29262 lfor 0/29262/29260 flags hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 12288 application rbd
pool 6 'rbd-data-hdd' erasure profile ec32-hdd size 5 min_size 4 crush_rule 4 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 29782 lfor 0/29782/29780 flags hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 12288 application rbd

We have a test volume present on each of these pools.

btullis@cephosd1005:~$ sudo rbd info rbd-metadata-ssd/test-ssd-volume
rbd image 'test-ssd-volume':
	size 10 GiB in 2560 objects
	order 22 (4 MiB objects)
	snapshot_count: 0
	id: 25b9aaf6aa3bd
	data_pool: rbd-data-ssd
	block_name_prefix: rbd_data.3.25b9aaf6aa3bd
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, data-pool
	op_features: 
	flags: 
	create_timestamp: Wed Jul 26 16:30:55 2023
	access_timestamp: Wed Jul 26 16:30:55 2023
	modify_timestamp: Thu Jul 27 15:51:24 2023

However, I think that it would be better in terms of this initial testing of the Kubernetes CSI to use a replicated pool with a replication factor of 3. It is a simpler option. We can look at efficiencies of storage space later.

I have:

  • created a pool named dse-k8s-csi-ssd with the following command:
btullis@cephosd1005:~$ sudo ceph osd pool create dse-k8s-csi-ssd 800 800 replicated ssd --autoscale-mode=on
pool 'dse-k8s-csi-ssd' created
  • associated this pool with the rbd application.
btullis@cephosd1005:~$ sudo ceph osd pool application enable dse-k8s-csi-ssd rbd
enabled application 'rbd' on pool 'dse-k8s-csi-ssd'
  • initialized the pool with rbd
btullis@cephosd1005:~$ sudo rbd pool init dse-k8s-csi-ssd
  • validated that it is visible.
btullis@cephosd1005:~$ sudo rbd pool stats dse-k8s-csi-ssd
Total Images: 0
Total Snapshots: 0
Provisioned Size: 0 B
btullis@cephosd1005:~$ sudo ceph df
--- RAW STORAGE ---
CLASS      SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    1010 TiB  982 TiB   28 TiB    28 TiB       2.77
ssd     140 TiB  139 TiB  286 GiB   286 GiB       0.20
TOTAL   1.1 PiB  1.1 PiB   28 TiB    28 TiB       2.46
 
--- POOLS ---
POOL              ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr               2    1  125 MiB       32  374 MiB      0     44 TiB
rbd-metadata-ssd   3   32   81 GiB  432.05k  248 GiB   0.18     44 TiB
rbd-metadata-hdd   4   32    692 B        5   24 KiB      0    311 TiB
rbd-data-ssd       5   32   10 GiB    2.56k   17 GiB   0.01     79 TiB
rbd-data-hdd       6   32   40 GiB   10.33k   67 GiB      0    559 TiB
dse-k8s-csi-ssd    7  476     19 B        1   12 KiB      0     44 TiB

Change #1026819 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Add a ceph client for the dse-k8s container storage interface

https://gerrit.wikimedia.org/r/1026819

Change #1026819 merged by Btullis:

[operations/puppet@production] Add a ceph client for the dse-k8s container storage interface

https://gerrit.wikimedia.org/r/1026819

I note that there is an upstream chart available: https://github.com/ceph/ceph-csi/tree/devel/charts/ceph-csi-rbd

I will have a brief look at it to see whether or not it might be suitable for us.

On first glance it looks good, so I have made a request for a review of the chart based on our policy.

That request is here: https://wikitech.wikimedia.org/wiki/Helm/Upstream_Charts/ceph-csi-rbd

I have specified version 3.7.2 because that is the last version that officially supported version 1.23 of Kubernetes.
That matches the version of the ceph-csi container image that we built as well.

Change #1028773 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Fix the cephosd dse-k8s-csi user caps

https://gerrit.wikimedia.org/r/1028773

Change #1028773 merged by Btullis:

[operations/puppet@production] Fix the cephosd dse-k8s-csi user caps

https://gerrit.wikimedia.org/r/1028773

Change #1028931 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Initial import of ceph-csi-rbd chart for inspection

https://gerrit.wikimedia.org/r/1028931

BTullis renamed this task from Support PersistentVolumeClaim objects on dse-k8s cluster to Enable the Container Storage Interface (CSI) and the Ceph CSI plugin on dse-k8s cluster.May 15 2024, 11:29 AM

Change #1031589 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] [WIP] Add a values file for the ceph-csi plugin on dse-k8s-eqiad

https://gerrit.wikimedia.org/r/1031589

Change #1046666 had a related patch set uploaded (by Btullis; author: Btullis):

[labs/private@master] Add a Cephx user key for the cephcsi plugin to use

https://gerrit.wikimedia.org/r/1046666

Change #1046666 merged by Btullis:

[labs/private@master] Add a dummy Cephx user key for the cephcsi plugin to use

https://gerrit.wikimedia.org/r/1046666

I believe that I have finished my work on T364472: Assess the suitability of the upstream ceph-csi-rbd helm chart for deployment so that is awaiting a review from others.
I'll mark this ticket as blocked, pending the outcome of that review.