During docker stress testing we had multiple issues with kubernetes nodes (the ganeti ones) failing completely because calico-node was evicted from them. T289111 describes the aftermath of that exact situation (which stayed undetected mainly because eqiad is depooled).
Enabling the admission controller is the easy task but we will also want to limit the Kubernetes default priority classes system-cluster-critical and system-node-critical to only be used for Pods in namespaces we "trust" (like kube-system for services clusters, ml may need additional for istio, kf*).
This can be done by providing AdmissionConfiguration via kube-apiserver flag --admission-control-config-file like:
apiVersion: apiserver.k8s.io/v1alpha1 kind: AdmissionConfiguration plugins: - name: "ResourceQuota" configuration: apiVersion: resourcequota.admission.k8s.io/v1beta1 kind: Configuration limitedResources: - resource: pods matchScopes: - scopeName: PriorityClass operator: In values: - system-cluster-critical - system-node-critical
And explicitly granting namespaces the permission to use those classes by adding a ResourceQuota object:
apiVersion: v1 kind: ResourceQuota metadata: name: priorityclass namespace: kube-system spec: scopeSelector: matchExpressions: - operator : In scopeName: PriorityClass values: - system-cluster-critical - system-node-critical
Relevant reads:
- https://people.wikimedia.org/~jayme/k8s-docs/v1.16/docs/reference/access-authn-authz/admission-controllers/#priority
- https://people.wikimedia.org/~jayme/k8s-docs/v1.16/docs/concepts/configuration/pod-priority-preemption/
- https://people.wikimedia.org/~jayme/k8s-docs/v1.16/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
- https://people.wikimedia.org/~jayme/k8s-docs/v1.16/docs/concepts/policy/resource-quotas/#limit-priority-class-consumption-by-default