This is to activate the recent runc security update
Procedure:
- cordon+drain x001 and wait for it to drain: kubectl drain ml-serveXXX1.YYYYY.wmnet --delete-emptydir-data --ignore-daemonsets
- kill the daemonsets, they will autorestart, e.g.: kubectl -n kube-system delete pod calico-...
- uncordon x001: kubectl uncordon ml-serveXXX1.YYYYY.wmnet
- cordon x002-x008: kubectl cordon ml-serve100{2..8}.eqiad.wmnet
- foreach x002..x008
- drain node: kubectl drain ml-serveXXXX.YYYYY.wmnet --delete-emptydir-data --ignore-daemonsets
- kill daemonsets (see above for example)
- uncordon node: kubectl uncordon ml-serveXXXX.YYYYY.wmnet
Step 4 will prevent pods from the currently-draining node go to nodes that will soon be drained, which may exhaust their disruption budget. Since nodes are uncordoned one by one, there should always be capacity to run evicted pod on the updated nodes.