Page MenuHomePhabricator

Enable the Priority admission plugin
Closed, ResolvedPublic

Description

During docker stress testing we had multiple issues with kubernetes nodes (the ganeti ones) failing completely because calico-node was evicted from them. T289111 describes the aftermath of that exact situation (which stayed undetected mainly because eqiad is depooled).

Enabling the admission controller is the easy task but we will also want to limit the Kubernetes default priority classes system-cluster-critical and system-node-critical to only be used for Pods in namespaces we "trust" (like kube-system for services clusters, ml may need additional for istio, kf*).

This can be done by providing AdmissionConfiguration via kube-apiserver flag --admission-control-config-file like:

apiVersion: apiserver.k8s.io/v1alpha1
kind: AdmissionConfiguration
plugins:
- name: "ResourceQuota"
  configuration:
    apiVersion: resourcequota.admission.k8s.io/v1beta1
    kind: Configuration
    limitedResources:
    - resource: pods
      matchScopes:
      - scopeName: PriorityClass 
        operator: In
        values: 
        - system-cluster-critical
        - system-node-critical

And explicitly granting namespaces the permission to use those classes by adding a ResourceQuota object:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: priorityclass
  namespace: kube-system
spec:
  scopeSelector:
    matchExpressions:
      - operator : In
        scopeName: PriorityClass
        values: 
        - system-cluster-critical
        - system-node-critical

Relevant reads:

Event Timeline

JMeybohm created this task.

Change 713804 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] kubernetes/staging: Reorder hiera keys to match production order

https://gerrit.wikimedia.org/r/713804

Change 713805 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] kubernetes/staging: Enable Priority admission plugin in codfw

https://gerrit.wikimedia.org/r/713805

Change 713806 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] kubernetes/staging: Enable Priority admission plugin in staging

https://gerrit.wikimedia.org/r/713806

Change 713807 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] kubernetes: Enable Priority admission plugin

https://gerrit.wikimedia.org/r/713807

Change 713804 merged by JMeybohm:

[operations/puppet@production] kubernetes/staging: Reorder hiera keys to match production order

https://gerrit.wikimedia.org/r/713804

Change 713805 merged by JMeybohm:

[operations/puppet@production] kubernetes/staging: Enable Priority admission plugin in codfw

https://gerrit.wikimedia.org/r/713805

JMeybohm updated the task description. (Show Details)
JMeybohm added subscribers: Jelto, akosiaris.

After enabling the admission plugin in staging-codfw I and deleting Pods that do define a priorityClass, the priority is added correctly:

# kubectl -n kube-system describe po calico-node-7s266 |grep Priority
Priority Class Name:  system-node-critical
# kubectl -n kube-system describe po calico-node-ljj9b |grep Priority
Priority:             2000001000
Priority Class Name:  system-node-critical

Running pods are ofc. unaffected, pods (from sevices) not specifying priorityClass are assigned with "Priority: 0" as expected.

Change 713806 merged by JMeybohm:

[operations/puppet@production] kubernetes/staging: Enable Priority admission plugin in staging

https://gerrit.wikimedia.org/r/713806

Change 714038 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] admin_ng: Deploy a ResourceQuota to allow priority pods in kube-system

https://gerrit.wikimedia.org/r/714038

Mentioned in SAL (#wikimedia-operations) [2021-08-20T12:00:48Z] <jayme> enabled priority admission plugin on k8s staging, rolling restart all pods in kube-system namespace - T289131

Mentioned in SAL (#wikimedia-operations) [2021-08-20T15:37:19Z] <jayme> deleting various pods from staging to have them recreated with priorities - T289131

Change 714071 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] k8s::apiserver: Add admission controller config file

https://gerrit.wikimedia.org/r/714071

Change 714038 merged by jenkins-bot:

[operations/deployment-charts@master] admin_ng: Deploy a ResourceQuota to allow priority pods in kube-system

https://gerrit.wikimedia.org/r/714038

Change 714071 merged by JMeybohm:

[operations/puppet@production] k8s::apiserver: Add admission controller config file

https://gerrit.wikimedia.org/r/714071

Test in staging-codfw looks as expected:

root@deploy1002:~# kubectl -n miscweb apply -f /home/jayme/debug_pod.yaml                                                                                
Error from server (Forbidden): error when creating "/home/jayme/debug_pod.yaml": pods "jayme" is forbidden: pods with system-cluster-critical priorityClass is not permitted in miscweb namespace
root@deploy1002:~# kubectl -n kube-system apply -f /home/jayme/debug_pod.yaml 
pod/jayme created
root@deploy1002:~# kubectl -n kube-system delete po jayme
pod "jayme" deleted
root@deploy1002:~# sed 's/system-cluster-critical/system-node-critical/' -i /home/jayme/debug_pod.yaml
root@deploy1002:~# kubectl -n miscweb apply -f /home/jayme/debug_pod.yaml 
Error from server (Forbidden): error when creating "/home/jayme/debug_pod.yaml": pods "jayme" is forbidden: pods with system-node-critical priorityClass is not permitted in miscweb namespace
root@deploy1002:~# 
root@deploy1002:~# kubectl -n kube-system apply -f /home/jayme/debug_pod.yaml 
pod/jayme created
root@deploy1002:~# kubectl -n kube-system delete po jayme
pod "jayme" deleted

Change 713807 merged by JMeybohm:

[operations/puppet@production] kubernetes: Enable Priority admission plugin

https://gerrit.wikimedia.org/r/713807

Change 714717 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] kubernetes/staging: Limit use of PriorityClass

https://gerrit.wikimedia.org/r/714717

Change 714718 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] kubernetes: Limit use of PriorityClass

https://gerrit.wikimedia.org/r/714718

Change 714717 merged by JMeybohm:

[operations/puppet@production] kubernetes/staging: Limit use of PriorityClass

https://gerrit.wikimedia.org/r/714717

Change 714718 merged by JMeybohm:

[operations/puppet@production] kubernetes: Limit use of PriorityClass

https://gerrit.wikimedia.org/r/714718

Mentioned in SAL (#wikimedia-operations) [2021-08-25T11:39:13Z] <jayme> slowly restarting all pods in kube-system namespace in eqiad k8s cluster - T289131

Mentioned in SAL (#wikimedia-operations) [2021-08-25T13:02:58Z] <jayme> restarted all pods in kube-system namespace in codfw k8s cluster - T289131