Page MenuHomePhabricator

Do a rolling drain/undrain of LW k8s nodes in eqiad and codfw
Closed, ResolvedPublic1 Estimated Story Points

Description

This is to activate the recent runc security update

Procedure:

  1. cordon+drain x001 and wait for it to drain: kubectl drain ml-serveXXX1.YYYYY.wmnet --delete-emptydir-data --ignore-daemonsets
  2. kill the daemonsets, they will autorestart, e.g.: kubectl -n kube-system delete pod calico-...
  3. uncordon x001: kubectl uncordon ml-serveXXX1.YYYYY.wmnet
  4. cordon x002-x008: kubectl cordon ml-serve100{2..8}.eqiad.wmnet
  5. foreach x002..x008
    1. drain node: kubectl drain ml-serveXXXX.YYYYY.wmnet --delete-emptydir-data --ignore-daemonsets
    2. kill daemonsets (see above for example)
    3. uncordon node: kubectl uncordon ml-serveXXXX.YYYYY.wmnet

Step 4 will prevent pods from the currently-draining node go to nodes that will soon be drained, which may exhaust their disruption budget. Since nodes are uncordoned one by one, there should always be capacity to run evicted pod on the updated nodes.