Page MenuHomePhabricator

Script the process of upgrading a node with kubeadm to 1.16.9
Closed, ResolvedPublic

Description

The process of upgrading nodes is fairly long and tedious. While the control plane should be done with manual care, one node at a time, the worker nodes should probably have a much more automated process:

All worker nodes should have kubeadm upgraded to 1.16.9 before beginning. Refer to https://v1-16.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/

  1. Drain with kubectl drain --force --ignore-daemonsets --delete-local-data $node
  2. kubeadm upgrade node on the node
  3. Upgrade kubelet, docker, containerd.io
  4. Restart docker and kubelet
  5. kubectl uncordon $node

The process should be done either one-at-a-time or in small batches to avoid overloading nodes during the depooling. It may be possible to encode this all in a spicerack cookbook? It may just be a small script run from a local machine.

Event Timeline

JHedden assigned this task to Bstorm.May 5 2020, 4:15 PM
JHedden triaged this task as High priority.
JHedden moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.
Bstorm reassigned this task from Bstorm to aborrero.May 5 2020, 7:12 PM

Change 595964 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] kubeadm: add wmcs-k8s-node-upgrade.py script

https://gerrit.wikimedia.org/r/595964

Change 595964 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] kubeadm: add wmcs-k8s-node-upgrade.py script

https://gerrit.wikimedia.org/r/595964

Change 596483 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] apt-upgrade: give support to understand dist/component

https://gerrit.wikimedia.org/r/596483

Change 596483 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] apt-upgrade: give support to understand dist/component

https://gerrit.wikimedia.org/r/596483

aborrero closed this task as Resolved.May 26 2020, 9:22 AM

Closing task for now, this seems done.

Change 599006 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] kubeadm: wmcs-k8s-node-upgrade: improve a bit output reading

https://gerrit.wikimedia.org/r/599006

Change 599006 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] kubeadm: wmcs-k8s-node-upgrade: improve a bit output reading

https://gerrit.wikimedia.org/r/599006

Change 599367 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] kubeadm: wmcs-k8s-node-upgrade.py: upgrade docker-ce not docker

https://gerrit.wikimedia.org/r/599367

Change 599367 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] kubeadm: wmcs-k8s-node-upgrade.py: upgrade docker-ce not docker

https://gerrit.wikimedia.org/r/599367

Change 599371 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] kubeadm: wmcs-k8s-node-upgrade.py: add more force options to apt-get install

https://gerrit.wikimedia.org/r/599371

Change 599371 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] kubeadm: wmcs-k8s-node-upgrade.py: add more force options to apt-get install

https://gerrit.wikimedia.org/r/599371

Change 599374 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] kubeadm: wmcs-k8s-node-upgrade: don't skip the pause stage

https://gerrit.wikimedia.org/r/599374

Change 599374 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] kubeadm: wmcs-k8s-node-upgrade: don't skip the pause stage

https://gerrit.wikimedia.org/r/599374

Change 599380 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] kubeadm: wmcs-k8s-node-upgrade.py: force certificate renewal with kubeadm

https://gerrit.wikimedia.org/r/599380

Change 599380 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] kubeadm: wmcs-k8s-node-upgrade.py: force certificate renewal with kubeadm

https://gerrit.wikimedia.org/r/599380

Change 599472 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] toolforge-k8s: proposing removing hostkey checking for the upgrades

https://gerrit.wikimedia.org/r/599472

Change 599472 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] toolforge-k8s: proposing removing hostkey checking for the upgrades

https://gerrit.wikimedia.org/r/599472