In the sre.k8s.upgrade-cluster cookbook there is the possibility of reimaging the etcd cluster (of a given k8s cluster) to a target OS, that in the case of 1.23 is bullseye. We can reimage one node at the time, so the first strategy that I used at first was:
- Stop etcd on all nodes and disable puppet.
- Reimage nodes one at the time.
The idea was to basically bootstrap a cluster from scratch, and I naively though that it would have worked. One thing that I discovered is that the SRV records env variables in /etc/default/etcd play a role, and when the first etcd node boots for the first time it is very upset to not find any of its other members alive and reachable for a leader election. Setting the environment variable to tag the cluster as new and commenting all the SRV/Discovery related ones seems to work, namely the node bootstraps itself but thinking it is on a single-node cluster. The bootstrap of the rest of the nodes is not very clean, it requires some manual restarts and hacks as well, so I thought to try the following procedure instead:
- Stop etcd on one node (while the rest is up).
- Reimage the node.
- Wait for the cluster to be healthy
- goto 1) with another node
The procedure seemed sound but I didn't know that another problem may have arisen, namely that after reimaging the first node it won't have bootstrapped correctly since the Raft log's last commit/id wasn't matching the one provided by the rest of the nodes. After reading some guides I discovered that simply removing/adding the member from the cluster is sufficient to let it bootstrap with a brand new Raft log (that would be synced with the one provided by the rest of the cluster). For the moment adding/removing nodes is not well supported by Spicerack IIRC, but maybe we could add some support.
Does anybody else have a different experience with etcd? Is there another procedure to do it safely? We should probably add something to https://wikitech.wikimedia.org/wiki/Etcd at the end of the task.