Page MenuHomePhabricator

Assess the suitability of the upstream ceph-csi-rbd helm chart for deployment
Open, MediumPublic

Description

We would like to deploy the Ceph container storage interface (CSI) plugins to the dse-k8s cluster, enabling the creation of PersistentVolumeClaim objects that are backed by the Data-Platform team's Ceph cluster.
We are following the guidelines here: https://docs.ceph.com/en/reef/rbd/rbd-kubernetes

The Ceph project maintains a set of two helm charts for this purpose. One of them matches very closely the functionality that we are seeking.
This is the chart in question: https://github.com/ceph/ceph-csi/tree/devel/charts/ceph-csi-rbd
(The other chart uses the CephFS filesystem, which is a feature we have decided not to enable, for now.)

I have created a request for a review of this chart: https://wikitech.wikimedia.org/wiki/Helm/Upstream_Charts/ceph-csi-rbd as per policy guidelines. I would now value some assistance from someone outside of the Data-Platform-SRE team, to help assess the suitability of the chart.

Event Timeline

Change #1028931 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Initial import of ceph-csi-rbd chart for inspection

https://gerrit.wikimedia.org/r/1028931

Change #1028932 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Add WMF annotations to the imported ceph-csi-rbd plugin

https://gerrit.wikimedia.org/r/1028932

BTullis updated Other Assignee, added: BTullis.

I've not looked in detail (and I probably will not be able to before end of next week) but immediately worrisome to me is the daemonset running with superpowers we decided to not do this with calico for example and instead we distribute the cni plugin via debian packages - would that potentially be an option as well or has it been considered?

Change #1028938 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] [WIP] Deploy the ceph-csi-rbd chart to dse-k8s with default values

https://gerrit.wikimedia.org/r/1028938

I have created a phaste with the default output from the helm chart, without overriding any of the values.

1The following output was from this helm-lint run: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1028938
2```
3Diff for test case admin_ng/dse-k8s-eqiad
4```
5```
6# Source: ceph-csi-rbd/templates/nodeplugin-serviceaccount.yaml
7apiVersion: v1
8kind: ServiceAccount
9metadata:
10 name: ceph-csi-rbd-nodeplugin
11 namespace: ceph-csi-rbd
12 labels:
13 app: ceph-csi-rbd
14 chart: ceph-csi-rbd-3-canary
15 component: nodeplugin
16 release: ceph-csi-rbd
17 heritage: Helm
18---
19# Source: ceph-csi-rbd/templates/provisioner-serviceaccount.yaml
20apiVersion: v1
21kind: ServiceAccount
22metadata:
23 name: ceph-csi-rbd-provisioner
24 namespace: ceph-csi-rbd
25 labels:
26 app: ceph-csi-rbd
27 chart: ceph-csi-rbd-3-canary
28 component: provisioner
29 release: ceph-csi-rbd
30 heritage: Helm
31---
32# Source: ceph-csi-rbd/templates/ceph-conf.yaml
33apiVersion: v1
34kind: ConfigMap
35metadata:
36 name: "ceph-config"
37 namespace: ceph-csi-rbd
38 labels:
39 app: ceph-csi-rbd
40 chart: ceph-csi-rbd-3-canary
41 component: nodeplugin
42 release: ceph-csi-rbd
43 heritage: Helm
44data:
45 ceph.conf: |
46 [global]
47 auth_cluster_required = cephx
48 auth_service_required = cephx
49 auth_client_required = cephx
50
51 keyring: ""
52---
53# Source: ceph-csi-rbd/templates/csiplugin-configmap.yaml
54apiVersion: v1
55kind: ConfigMap
56metadata:
57 name: "ceph-csi-config"
58 namespace: ceph-csi-rbd
59 labels:
60 app: ceph-csi-rbd
61 chart: ceph-csi-rbd-3-canary
62 component: nodeplugin
63 release: ceph-csi-rbd
64 heritage: Helm
65data:
66 config.json: |-
67 []
68 cluster-mapping.json: |-
69 []
70---
71# Source: ceph-csi-rbd/templates/encryptionkms-configmap.yaml
72apiVersion: v1
73kind: ConfigMap
74metadata:
75 name: "ceph-csi-encryption-kms-config"
76 namespace: ceph-csi-rbd
77 labels:
78 app: ceph-csi-rbd
79 chart: ceph-csi-rbd-3-canary
80 component: nodeplugin
81 release: ceph-csi-rbd
82 heritage: Helm
83data:
84 config.json: |-
85 {}
86---
87# Source: ceph-csi-rbd/templates/nodeplugin-clusterrole.yaml
88kind: ClusterRole
89apiVersion: rbac.authorization.k8s.io/v1
90metadata:
91 name: ceph-csi-rbd-nodeplugin
92 labels:
93 app: ceph-csi-rbd
94 chart: ceph-csi-rbd-3-canary
95 component: nodeplugin
96 release: ceph-csi-rbd
97 heritage: Helm
98rules:
99 # allow to read Vault Token and connection options from the Tenants namespace
100 - apiGroups: [""]
101 resources: ["secrets"]
102 verbs: ["get"]
103 - apiGroups: [""]
104 resources: ["configmaps"]
105 verbs: ["get"]
106 - apiGroups: [""]
107 resources: ["serviceaccounts"]
108 verbs: ["get"]
109 - apiGroups: [""]
110 resources: ["persistentvolumes"]
111 verbs: ["get"]
112 - apiGroups: ["storage.k8s.io"]
113 resources: ["volumeattachments"]
114 verbs: ["list", "get"]
115 - apiGroups: [""]
116 resources: ["serviceaccounts/token"]
117 verbs: ["create"]
118---
119# Source: ceph-csi-rbd/templates/provisioner-clusterrole.yaml
120kind: ClusterRole
121apiVersion: rbac.authorization.k8s.io/v1
122metadata:
123 name: ceph-csi-rbd-provisioner
124 labels:
125 app: ceph-csi-rbd
126 chart: ceph-csi-rbd-3-canary
127 component: provisioner
128 release: ceph-csi-rbd
129 heritage: Helm
130rules:
131 - apiGroups: [""]
132 resources: ["secrets"]
133 verbs: ["get", "list", "watch"]
134 - apiGroups: [""]
135 resources: ["persistentvolumes"]
136 verbs: ["get", "list", "watch", "create", "update", "delete", "patch"]
137 - apiGroups: [""]
138 resources: ["persistentvolumeclaims"]
139 verbs: ["get", "list", "watch", "update"]
140 - apiGroups: ["storage.k8s.io"]
141 resources: ["storageclasses"]
142 verbs: ["get", "list", "watch"]
143 - apiGroups: [""]
144 resources: ["events"]
145 verbs: ["list", "watch", "create", "update", "patch"]
146 - apiGroups: [""]
147 resources: ["endpoints"]
148 verbs: ["get", "create", "update"]
149 - apiGroups: ["storage.k8s.io"]
150 resources: ["volumeattachments"]
151 verbs: ["get", "list", "watch", "update", "patch"]
152 - apiGroups: ["storage.k8s.io"]
153 resources: ["volumeattachments/status"]
154 verbs: ["patch"]
155 - apiGroups: ["snapshot.storage.k8s.io"]
156 resources: ["volumesnapshots"]
157 verbs: ["get", "list", "patch"]
158 - apiGroups: ["snapshot.storage.k8s.io"]
159 resources: ["volumesnapshots/status"]
160 verbs: ["get", "list", "patch"]
161 - apiGroups: ["snapshot.storage.k8s.io"]
162 resources: ["volumesnapshotcontents"]
163 verbs: ["create", "get", "list", "watch", "update", "delete", "patch"]
164 - apiGroups: ["snapshot.storage.k8s.io"]
165 resources: ["volumesnapshotclasses"]
166 verbs: ["get", "list", "watch"]
167 - apiGroups: ["snapshot.storage.k8s.io"]
168 resources: ["volumesnapshotcontents/status"]
169 verbs: ["update", "patch"]
170 - apiGroups: [""]
171 resources: ["configmaps"]
172 verbs: ["get"]
173 - apiGroups: [""]
174 resources: ["serviceaccounts"]
175 verbs: ["get"]
176 - apiGroups: [""]
177 resources: ["persistentvolumeclaims/status"]
178 verbs: ["update", "patch"]
179 - apiGroups: [""]
180 resources: ["serviceaccounts/token"]
181 verbs: ["create"]
182---
183# Source: ceph-csi-rbd/templates/nodeplugin-clusterrolebinding.yaml
184kind: ClusterRoleBinding
185apiVersion: rbac.authorization.k8s.io/v1
186metadata:
187 name: ceph-csi-rbd-nodeplugin
188 labels:
189 app: ceph-csi-rbd
190 chart: ceph-csi-rbd-3-canary
191 component: nodeplugin
192 release: ceph-csi-rbd
193 heritage: Helm
194subjects:
195 - kind: ServiceAccount
196 name: ceph-csi-rbd-nodeplugin
197 namespace: ceph-csi-rbd
198roleRef:
199 kind: ClusterRole
200 name: ceph-csi-rbd-nodeplugin
201 apiGroup: rbac.authorization.k8s.io
202---
203# Source: ceph-csi-rbd/templates/provisioner-clusterrolebinding.yaml
204kind: ClusterRoleBinding
205apiVersion: rbac.authorization.k8s.io/v1
206metadata:
207 name: ceph-csi-rbd-provisioner
208 labels:
209 app: ceph-csi-rbd
210 chart: ceph-csi-rbd-3-canary
211 component: provisioner
212 release: ceph-csi-rbd
213 heritage: Helm
214subjects:
215 - kind: ServiceAccount
216 name: ceph-csi-rbd-provisioner
217 namespace: ceph-csi-rbd
218roleRef:
219 kind: ClusterRole
220 name: ceph-csi-rbd-provisioner
221 apiGroup: rbac.authorization.k8s.io
222---
223# Source: ceph-csi-rbd/templates/provisioner-role.yaml
224kind: Role
225apiVersion: rbac.authorization.k8s.io/v1
226metadata:
227 name: ceph-csi-rbd-provisioner
228 namespace: ceph-csi-rbd
229 labels:
230 app: ceph-csi-rbd
231 chart: ceph-csi-rbd-3-canary
232 component: provisioner
233 release: ceph-csi-rbd
234 heritage: Helm
235rules:
236 - apiGroups: [""]
237 resources: ["configmaps"]
238 verbs: ["get", "list", "watch", "create","update", "delete"]
239 - apiGroups: ["coordination.k8s.io"]
240 resources: ["leases"]
241 verbs: ["get", "watch", "list", "delete", "update", "create"]
242---
243# Source: ceph-csi-rbd/templates/provisioner-rolebinding.yaml
244kind: RoleBinding
245apiVersion: rbac.authorization.k8s.io/v1
246metadata:
247 name: ceph-csi-rbd-provisioner
248 namespace: ceph-csi-rbd
249 labels:
250 app: ceph-csi-rbd
251 chart: ceph-csi-rbd-3-canary
252 component: provisioner
253 release: ceph-csi-rbd
254 heritage: Helm
255subjects:
256 - kind: ServiceAccount
257 name: ceph-csi-rbd-provisioner
258 namespace: ceph-csi-rbd
259roleRef:
260 kind: Role
261 name: ceph-csi-rbd-provisioner
262 apiGroup: rbac.authorization.k8s.io
263---
264# Source: ceph-csi-rbd/templates/nodeplugin-http-service.yaml
265apiVersion: v1
266kind: Service
267metadata:
268 name: ceph-csi-rbd-nodeplugin-http-metrics
269 namespace: ceph-csi-rbd
270 labels:
271 app: ceph-csi-rbd
272 chart: ceph-csi-rbd-3-canary
273 component: nodeplugin
274 release: ceph-csi-rbd
275 heritage: Helm
276spec:
277 ports:
278 - name: http-metrics
279 port: 8080
280 targetPort: 8080
281 selector:
282 app: ceph-csi-rbd
283 component: nodeplugin
284 release: ceph-csi-rbd
285 type: "ClusterIP"
286---
287# Source: ceph-csi-rbd/templates/provisioner-http-service.yaml
288apiVersion: v1
289kind: Service
290metadata:
291 name: ceph-csi-rbd-provisioner-http-metrics
292 namespace: ceph-csi-rbd
293 labels:
294 app: ceph-csi-rbd
295 chart: ceph-csi-rbd-3-canary
296 component: provisioner
297 release: ceph-csi-rbd
298 heritage: Helm
299spec:
300 ports:
301 - name: http-metrics
302 port: 8080
303 targetPort: 8080
304 selector:
305 app: ceph-csi-rbd
306 component: provisioner
307 release: ceph-csi-rbd
308 type: "ClusterIP"
309---
310# Source: ceph-csi-rbd/templates/nodeplugin-daemonset.yaml
311kind: DaemonSet
312apiVersion: apps/v1
313metadata:
314 name: ceph-csi-rbd-nodeplugin
315 namespace: ceph-csi-rbd
316 labels:
317 app: ceph-csi-rbd
318 chart: ceph-csi-rbd-3-canary
319 component: nodeplugin
320 release: ceph-csi-rbd
321 heritage: Helm
322spec:
323 selector:
324 matchLabels:
325 app: ceph-csi-rbd
326 component: nodeplugin
327 release: ceph-csi-rbd
328 updateStrategy:
329 type: RollingUpdate
330 template:
331 metadata:
332 labels:
333 app: ceph-csi-rbd
334 chart: ceph-csi-rbd-3-canary
335 component: nodeplugin
336 release: ceph-csi-rbd
337 heritage: Helm
338 spec:
339 serviceAccountName: ceph-csi-rbd-nodeplugin
340 hostNetwork: true
341 hostPID: true
342 priorityClassName: system-node-critical
343 # to use e.g. Rook orchestrated cluster, and mons' FQDN is
344 # resolved through k8s service, set dns policy to cluster first
345 dnsPolicy: ClusterFirstWithHostNet
346 containers:
347 - name: driver-registrar
348 # This is necessary only for systems with SELinux, where
349 # non-privileged sidecar containers cannot access unix domain socket
350 # created by privileged CSI driver container.
351 securityContext:
352 privileged: true
353 allowPrivilegeEscalation: true
354 image: "registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.5.1"
355 imagePullPolicy: IfNotPresent
356 args:
357 - "--v=5"
358 - "--csi-address=/csi/csi.sock"
359 - "--kubelet-registration-path=/var/lib/kubelet/plugins/rbd.csi.ceph.com/csi.sock"
360 env:
361 - name: KUBE_NODE_NAME
362 valueFrom:
363 fieldRef:
364 fieldPath: spec.nodeName
365 volumeMounts:
366 - name: socket-dir
367 mountPath: /csi
368 - name: registration-dir
369 mountPath: /registration
370 resources:
371 {}
372 - name: csi-rbdplugin
373 image: "quay.io/cephcsi/cephcsi:v3.7.2"
374 imagePullPolicy: IfNotPresent
375 args:
376 - "--nodeid=$(NODE_ID)"
377 - "--pluginpath=/var/lib/kubelet/plugins"
378 - "--stagingpath=/var/lib/kubelet/plugins/kubernetes.io/csi/"
379 - "--type=rbd"
380 - "--nodeserver=true"
381 - "--pidlimit=-1"
382 - "--endpoint=$(CSI_ENDPOINT)"
383 - "--csi-addons-endpoint=$(CSI_ADDONS_ENDPOINT)"
384 - "--v=5"
385 - "--drivername=$(DRIVER_NAME)"
386 env:
387 - name: POD_IP
388 valueFrom:
389 fieldRef:
390 fieldPath: status.podIP
391 - name: DRIVER_NAME
392 value: rbd.csi.ceph.com
393 - name: NODE_ID
394 valueFrom:
395 fieldRef:
396 fieldPath: spec.nodeName
397 - name: CSI_ENDPOINT
398 value: "unix:///csi/csi.sock"
399 - name: CSI_ADDONS_ENDPOINT
400 value: "unix:///csi/csi-addons.sock"
401 securityContext:
402 privileged: true
403 capabilities:
404 add: ["SYS_ADMIN"]
405 allowPrivilegeEscalation: true
406 volumeMounts:
407 - name: socket-dir
408 mountPath: /csi
409 - mountPath: /dev
410 name: host-dev
411 - mountPath: /run/mount
412 name: host-mount
413 - mountPath: /sys
414 name: host-sys
415 - mountPath: /etc/selinux
416 name: etc-selinux
417 readOnly: true
418 - mountPath: /lib/modules
419 name: lib-modules
420 readOnly: true
421 - name: ceph-csi-config
422 mountPath: /etc/ceph-csi-config/
423 - name: ceph-config
424 mountPath: /etc/ceph/
425 - name: ceph-csi-encryption-kms-config
426 mountPath: /etc/ceph-csi-encryption-kms-config/
427 - name: plugin-dir
428 mountPath: /var/lib/kubelet/plugins
429 mountPropagation: "Bidirectional"
430 - name: mountpoint-dir
431 mountPath: /var/lib/kubelet/pods
432 mountPropagation: "Bidirectional"
433 - name: keys-tmp-dir
434 mountPath: /tmp/csi/keys
435 - name: ceph-logdir
436 mountPath: /var/log/ceph
437 - name: oidc-token
438 mountPath: /run/secrets/tokens
439 readOnly: true
440 resources:
441 {}
442 - name: liveness-prometheus
443 securityContext:
444 privileged: true
445 allowPrivilegeEscalation: true
446 image: "quay.io/cephcsi/cephcsi:v3.7.2"
447 imagePullPolicy: IfNotPresent
448 args:
449 - "--type=liveness"
450 - "--endpoint=$(CSI_ENDPOINT)"
451 - "--metricsport=8080"
452 - "--metricspath=/metrics"
453 - "--polltime=60s"
454 - "--timeout=3s"
455 env:
456 - name: CSI_ENDPOINT
457 value: "unix:///csi/csi.sock"
458 - name: POD_IP
459 valueFrom:
460 fieldRef:
461 fieldPath: status.podIP
462 ports:
463 - containerPort: 8080
464 name: metrics
465 protocol: TCP
466 volumeMounts:
467 - name: socket-dir
468 mountPath: /csi
469 resources:
470 {}
471 volumes:
472 - name: socket-dir
473 hostPath:
474 path: "/var/lib/kubelet/plugins/rbd.csi.ceph.com"
475 type: DirectoryOrCreate
476 - name: registration-dir
477 hostPath:
478 path: /var/lib/kubelet/plugins_registry
479 type: Directory
480 - name: plugin-dir
481 hostPath:
482 path: /var/lib/kubelet/plugins
483 type: Directory
484 - name: mountpoint-dir
485 hostPath:
486 path: /var/lib/kubelet/pods
487 type: DirectoryOrCreate
488 - name: ceph-logdir
489 hostPath:
490 path: /var/log/ceph
491 type: DirectoryOrCreate
492 - name: host-dev
493 hostPath:
494 path: /dev
495 - name: host-mount
496 hostPath:
497 path: /run/mount
498 - name: host-sys
499 hostPath:
500 path: /sys
501 - name: etc-selinux
502 hostPath:
503 path: /etc/selinux
504 - name: lib-modules
505 hostPath:
506 path: /lib/modules
507 - name: ceph-config
508 configMap:
509 name: "ceph-config"
510 - name: ceph-csi-config
511 configMap:
512 name: "ceph-csi-config"
513 - name: ceph-csi-encryption-kms-config
514 configMap:
515 name: "ceph-csi-encryption-kms-config"
516 - name: keys-tmp-dir
517 emptyDir: {
518 medium: "Memory"
519 }
520 - name: oidc-token
521 projected:
522 sources:
523 - serviceAccountToken:
524 path: oidc-token
525 expirationSeconds: 3600
526 audience: ceph-csi-kms
527---
528# Source: ceph-csi-rbd/templates/provisioner-deployment.yaml
529kind: Deployment
530apiVersion: apps/v1
531metadata:
532 name: ceph-csi-rbd-provisioner
533 namespace: ceph-csi-rbd
534 labels:
535 app: ceph-csi-rbd
536 chart: ceph-csi-rbd-3-canary
537 component: provisioner
538 release: ceph-csi-rbd
539 heritage: Helm
540spec:
541 replicas: 3
542 strategy:
543 type: RollingUpdate
544 rollingUpdate:
545 maxUnavailable: 50%
546 selector:
547 matchLabels:
548 app: ceph-csi-rbd
549 component: provisioner
550 release: ceph-csi-rbd
551 template:
552 metadata:
553 labels:
554 app: ceph-csi-rbd
555 chart: ceph-csi-rbd-3-canary
556 component: provisioner
557 release: ceph-csi-rbd
558 heritage: Helm
559 spec:
560 affinity:
561 podAntiAffinity:
562 requiredDuringSchedulingIgnoredDuringExecution:
563 - labelSelector:
564 matchExpressions:
565 - key: app
566 operator: In
567 values:
568 - ceph-csi-rbd
569 - key: component
570 operator: In
571 values:
572 - provisioner
573 topologyKey: "kubernetes.io/hostname"
574 serviceAccountName: ceph-csi-rbd-provisioner
575 hostNetwork: false
576 priorityClassName: system-cluster-critical
577 containers:
578 - name: csi-provisioner
579 image: "gcr.io/k8s-staging-sig-storage/csi-provisioner:v3.2.1"
580 imagePullPolicy: IfNotPresent
581 args:
582 - "--csi-address=$(ADDRESS)"
583 - "--v=1"
584 - "--timeout=60s"
585 - "--leader-election=true"
586 - "--retry-interval-start=500ms"
587 - "--default-fstype=ext4"
588 - "--extra-create-metadata=true"
589 - "--feature-gates=HonorPVReclaimPolicy=true"
590 - "--prevent-volume-mode-conversion=true"
591 env:
592 - name: ADDRESS
593 value: "unix:///csi/csi-provisioner.sock"
594 volumeMounts:
595 - name: socket-dir
596 mountPath: /csi
597 resources:
598 {}
599 - name: csi-resizer
600 image: "registry.k8s.io/sig-storage/csi-resizer:v1.5.0"
601 imagePullPolicy: IfNotPresent
602 args:
603 - "--v=1"
604 - "--csi-address=$(ADDRESS)"
605 - "--timeout=60s"
606 - "--leader-election"
607 - "--retry-interval-start=500ms"
608 - "--handle-volume-inuse-error=false"
609 - "--feature-gates=RecoverVolumeExpansionFailure=true"
610 env:
611 - name: ADDRESS
612 value: "unix:///csi/csi-provisioner.sock"
613 volumeMounts:
614 - name: socket-dir
615 mountPath: /csi
616 resources:
617 {}
618 - name: csi-snapshotter
619 image: registry.k8s.io/sig-storage/csi-snapshotter:v6.0.1
620 imagePullPolicy: IfNotPresent
621 args:
622 - "--csi-address=$(ADDRESS)"
623 - "--v=1"
624 - "--timeout=60s"
625 - "--leader-election=true"
626 - "--extra-create-metadata=true"
627 env:
628 - name: ADDRESS
629 value: "unix:///csi/csi-provisioner.sock"
630 volumeMounts:
631 - name: socket-dir
632 mountPath: /csi
633 resources:
634 {}
635 - name: csi-attacher
636 image: "registry.k8s.io/sig-storage/csi-attacher:v3.5.0"
637 imagePullPolicy: IfNotPresent
638 args:
639 - "--v=1"
640 - "--csi-address=$(ADDRESS)"
641 - "--leader-election=true"
642 - "--retry-interval-start=500ms"
643 env:
644 - name: ADDRESS
645 value: "unix:///csi/csi-provisioner.sock"
646 volumeMounts:
647 - name: socket-dir
648 mountPath: /csi
649 resources:
650 {}
651 - name: csi-rbdplugin
652 image: "quay.io/cephcsi/cephcsi:v3.7.2"
653 imagePullPolicy: IfNotPresent
654 args:
655 - "--nodeid=$(NODE_ID)"
656 - "--type=rbd"
657 - "--controllerserver=true"
658 - "--pidlimit=-1"
659 - "--endpoint=$(CSI_ENDPOINT)"
660 - "--csi-addons-endpoint=$(CSI_ADDONS_ENDPOINT)"
661 - "--v=5"
662 - "--drivername=$(DRIVER_NAME)"
663 - "--rbdhardmaxclonedepth=8"
664 - "--rbdsoftmaxclonedepth=4"
665 - "--maxsnapshotsonimage=450"
666 - "--minsnapshotsonimage=250"
667 - "--setmetadata=true"
668 env:
669 - name: POD_IP
670 valueFrom:
671 fieldRef:
672 fieldPath: status.podIP
673 - name: DRIVER_NAME
674 value: rbd.csi.ceph.com
675 - name: NODE_ID
676 valueFrom:
677 fieldRef:
678 fieldPath: spec.nodeName
679 - name: CSI_ENDPOINT
680 value: "unix:///csi/csi-provisioner.sock"
681 - name: CSI_ADDONS_ENDPOINT
682 value: "unix:///csi/csi-addons.sock"
683 volumeMounts:
684 - name: socket-dir
685 mountPath: /csi
686 - mountPath: /dev
687 name: host-dev
688 - mountPath: /sys
689 name: host-sys
690 - mountPath: /lib/modules
691 name: lib-modules
692 readOnly: true
693 - name: ceph-csi-config
694 mountPath: /etc/ceph-csi-config/
695 - name: ceph-config
696 mountPath: /etc/ceph/
697 - name: ceph-csi-encryption-kms-config
698 mountPath: /etc/ceph-csi-encryption-kms-config/
699 - name: keys-tmp-dir
700 mountPath: /tmp/csi/keys
701 - name: oidc-token
702 mountPath: /run/secrets/tokens
703 readOnly: true
704 resources:
705 {}
706 - name: csi-rbdplugin-controller
707 image: "quay.io/cephcsi/cephcsi:v3.7.2"
708 imagePullPolicy: IfNotPresent
709 args:
710 - "--type=controller"
711 - "--v=5"
712 - "--drivername=$(DRIVER_NAME)"
713 - "--drivernamespace=$(DRIVER_NAMESPACE)"
714 - "--setmetadata=true"
715 env:
716 - name: DRIVER_NAMESPACE
717 valueFrom:
718 fieldRef:
719 fieldPath: metadata.namespace
720 - name: DRIVER_NAME
721 value: rbd.csi.ceph.com
722 volumeMounts:
723 - name: ceph-csi-config
724 mountPath: /etc/ceph-csi-config/
725 - name: keys-tmp-dir
726 mountPath: /tmp/csi/keys
727 - name: ceph-config
728 mountPath: /etc/ceph/
729 resources:
730 {}
731 - name: liveness-prometheus
732 image: "quay.io/cephcsi/cephcsi:v3.7.2"
733 imagePullPolicy: IfNotPresent
734 args:
735 - "--type=liveness"
736 - "--endpoint=$(CSI_ENDPOINT)"
737 - "--metricsport=8080"
738 - "--metricspath=/metrics"
739 - "--polltime=60s"
740 - "--timeout=3s"
741 env:
742 - name: CSI_ENDPOINT
743 value: "unix:///csi/csi-provisioner.sock"
744 - name: POD_IP
745 valueFrom:
746 fieldRef:
747 fieldPath: status.podIP
748 ports:
749 - containerPort: 8080
750 name: metrics
751 protocol: TCP
752 volumeMounts:
753 - name: socket-dir
754 mountPath: /csi
755 resources:
756 {}
757 volumes:
758 - name: socket-dir
759 emptyDir: {
760 medium: "Memory"
761 }
762 - name: host-dev
763 hostPath:
764 path: /dev
765 - name: host-sys
766 hostPath:
767 path: /sys
768 - name: lib-modules
769 hostPath:
770 path: /lib/modules
771 - name: ceph-config
772 configMap:
773 name: "ceph-config"
774 - name: ceph-csi-config
775 configMap:
776 name: "ceph-csi-config"
777 - name: ceph-csi-encryption-kms-config
778 configMap:
779 name: "ceph-csi-encryption-kms-config"
780 - name: keys-tmp-dir
781 emptyDir: {
782 medium: "Memory"
783 }
784 - name: oidc-token
785 projected:
786 sources:
787 - serviceAccountToken:
788 path: oidc-token
789 expirationSeconds: 3600
790 audience: ceph-csi-kms
791---
792# Source: ceph-csi-rbd/templates/csidriver-crd.yaml
793apiVersion: storage.k8s.io/v1
794
795kind: CSIDriver
796metadata:
797 name: rbd.csi.ceph.com
798spec:
799 attachRequired: true
800 podInfoOnMount: false
801 fsGroupPolicy: File
802
803---
804```

One of the first things to strike me is that there are six container images mentioned.
I was only really expecting the quay.io/cephcsi/cephcsi:v3.7.2 image, because that is the only one that is mentioned on this page and the only one that I have so far built.

  • image: registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.5.1
  • image: quay.io/cephcsi/cephcsi:v3.7.2
  • image: gcr.io/k8s-staging-sig-storage/csi-provisioner:v3.2.1
  • image: registry.k8s.io/sig-storage/csi-resizer:v1.5.0
  • image: registry.k8s.io/sig-storage/csi-snapshotter:v6.0.1
  • image: registry.k8s.io/sig-storage/csi-attacher:v3.5.0

I will continue to investigate this and whether these additional containers are a strict requirement.

The DaemonSet is deployed by default as @JMeybohm pointed out. Once again, this isn't specifically mentioned on https://docs.ceph.com/en/reef/rbd/rbd-kubernetes so I will check to see whether or not it is a requirement, or an option, or whether distributing the plugin with packages would be an option instead.

Gehel triaged this task as Medium priority.May 10 2024, 8:20 AM

I have been researching the CSI more and in particular this document and this helpful diagram

It seems that we do need the sidecar pods, although some of the functionality does seem optional.

  • csi-attacher
  • csi-provisioner
  • csi-resizer
  • csi-snapshotter
  • csi-node-registrar

These components are published by the Kubernetes team themselves, rather than by the third-party CSI driver vendor, which in this case is Ceph.

None of these components require privileged mode (I don't think) and I was able to build them with blubber/kokkuri relatively easily in this MR.
They do not seems closely coupled to the Kubernetes version, which is good.

The CSI Volume Driver itself (in this case cephcsi) only communicates via unix domain sockets, whereas the helper sidecars are responsible for communicating with the Kubernetes API.

image.png (540×960 px, 95 KB)

So, by my reckoning, it might be possible for us to extract the cephcsi binary out of the daemonset and run it directly on the host instead (as I believe we we do with calico), but it might be non-trivial.
I doubt that we could remove it from the StatefulSet/Deployment pod so easily, but that deployment doesn't require elevated privileges.

There are two PodSecurityPolicies to consider in this version, although they may optionally be disabled.

They were removed in a later version of the helm chart.

I'll carry on investigating and developing a values/dse-k8s-eqiad.yaml file that might match our needs.

Change #1031589 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] [WIP] Add a values file for the ceph-csi plugin on dse-k8s-eqiad

https://gerrit.wikimedia.org/r/1031589

  • csi-attacher
  • csi-provisioner
  • csi-resizer
  • csi-snapshotter
  • csi-node-registrar

None of these components require privileged mode (I don't think) ...

My statement above isn't 100% correct, but not 100% wrong either.

The csi-node-registrar sidecar, which runs alongside the cephcsi plugin in the DaemonSet has the following securityContext and this is currently hard-coded within the chart.

securityContext:
  privileged: true
  allowPrivilegeEscalation: true

However, there is a note alongside it, which says...

This is necessary only for systems with SELinux, where non-privileged sidecar containers cannot access unix domain socket created by privileged CSI driver container.

Therefore, I think that we could probably add a conditional around this and send that change upstream.

The bigger question remains. The csi-rbdplugin container within the DaemonSet has the following securityContext and I can't see any way around allowing these privileges if we run it in a container.

securityContext:
  privileged: true
  capabilities:
    add: ["SYS_ADMIN"]
  allowPrivilegeEscalation: true

Is this simply not permissible to run like this in a pod?
I can look into running the plugin on the host and removing it from the daemonset if we agree that it's a necessary step.

The csi-snapshotter sidecar has some additional requirements that are not immediately apparent.
This issue has some detail: https://github.com/ceph/ceph-csi/issues/4100

It resulted in a change to the docs that mentions specifically:

If you intend to use the snapshot functionality in Kubernetes cluster, please refer to snap-clone.md

The prerequisites mentioned on that page include:

This seems like a lot of additional complexity and I don't believe that we need support for volume snapshots at the moment.

Interestingly, there is no provisioner.snapshotter.enabled value mentioned in the chart (even later versions) although the csi-provisioner and and csi-attacher sidecars do have this option.

I would be happy to exclude the csi-snapshotter from the Deployment with a suitable values check, but I'm intrigued as to why it's not there already. I'll keep looking into it.

I have asked in the #ceph-csi channel of the ceph-storage Slack where or not there is a good reason for not adding this if-gate.

image.png (250×1 px, 62 KB)

I have had a response to my question from a maintainer of the cephcsi project.

image.png (323×867 px, 59 KB)

There's no technical reason why the snapshotter isn't already an optional component, it was merely an omission.
I will add patch our imported version (v3.7.2) of the chart and I will sent an MR upstream containing a similar patch against the devel branch.

I had a quick chat with Ben over meet, and we agreed to try to figure out if/why some containers do need things like priviledged: true and other capabilities. The most concerning one is csi-rbdplugin, that requires priviledged: true and CAP_SYS_ADMIN (the motivation seems to be https://github.com/ceph/ceph-csi/issues/2519#issuecomment-931497940). We are going to progress in two ways:

  • Ben is following up with upstream to verify why the other containers do need priviledges (for example, in one case they are needed due to SElinux config that don't apply to us, so we can safely if-gate those perms).
  • I am going to have a chat with the K8s-SIG next week about this use case, trying to get best practices and suggestions (I'll report them in this task).
  • Ben is going to explore the option of running csi-rbdplugin as daemon on the k8s node itself, reporting pros/cons and painful points for long term maintenance.

I have started by asking a question on the #ceph-csi Slack channel.

image.png (193×755 px, 40 KB)

Next, I'll add the if-gates around the csi-rbdplugin container in the daemonset and the external-snapshotter container within the deployment.
That should give me a way forward for testing of the non-privileged elements of the setup.

I had a chat with several folks in the K8s SIG and an update of the current best practices has been added:

https://wikitech.wikimedia.org/wiki/Kubernetes/Upstream_Helm_charts_policy#Best_practices_for_adoption_of_upstream_helm_charts

TL;DR: the pattern of creating a daemonset running a container with high privileges is acceptable, but a careful review is needed to make sure that we really need it. The privileged pods should run in the kube-system namespace.

@elukey - @JMeybohm - @akosiaris - I believe that the stack of patches to enable ceph-csi-rbd on the dse-k8s cluster is now ready for a review.

Here is the fourth and final patch in the stack, which is where we customise the admin_ng deployment for dse-k8s: Add a values file for the ceph-csi plugin on dse-k8s-eqiad
So this is the patch with the the most relevant helm-lint output to examine.

Modification of the upstream chart was all done in this patch: Add WMF customisations to the upstream ceph-csi-rbd chart

These modifications are:

  • adding conditionals to permit disabling privilege escalation on all pods where this was practicable (liveness-prometheus and node-registrar)
  • adding a conditional to permit disabling the csi-snapshot functionality, since we do not currently need this extra complexity and associated CRDs etc.
  • adding a simple annotation to track the upstream chart version that was modified
  • changing the version string to include a wmf build component

There is still one container which requires elevated privileges. This is the csi-rbdplugin nodeplugin container, which runs on evey node as part of the daemonset. There is currently no way that we can avoid requiring SYS_ADMIN capabilities for this container and, as was discussed at the recent k8s-sig, extracting this from the daemonset is likely to lcause more problems than it might solve.

Here is the upstream chart review record, which is still showing as undecided: https://wikitech.wikimedia.org/wiki/Helm/Upstream_Charts/ceph-csi-rbd

If you could have a look when convenient, I'd be very grateful.