Background
conf200[7-9] are in the process of being racked and setup in codfw. Once ready, we need to integrate them in order to replace conf200[4-6].
Unfortunately, integrating them incrementally into the existing cluster(s) gets a bit complicated. Specifically:
- conf200[4-6] (and configcluster nodes generally) are running debian bullseye, with etcd 3.3.25 and zookeeper 3.4.13.
- Given their respective support horizons, we would like to target trixie (etcd 3.5.16, zookeeper 3.9.3) rather than bookworm (etcd 3.4.23, zookeeper 3.8.0).
- For etcd, targeting trixie means running afoul of the version-compatibility rules, where only upgrades to successive minor versions are supported (link).
- For zookeeper, targeting either trixie or bookworm means running afoul of similar minor-version compatibility rules (link).
Finally, it's worth noting that with the ongoing work to upgrade Kafka (T416669), we should be moving toward a world where it becomes possible to use KRaft instead of zookeeper. Thus, any major investment in zookeeper - e.g., executing a multi-step upgrade all the way through 3.9 - is not desirable.
Proposal
I'd propose we do something like the following:
- We will create a (temporary) configcluster role variant that only installs zookeeper. On trixie, that role will install a forward port of zookeeper 3.4.13 from bullseye (T418915#11851872).
- We will incrementally assign the new role to conf200[7-9] and integrate each host into the existing zookeeper cluster.
- We will assign the normal configcluster role to conf200[7-9] with hieradata overrides to bootstrap a new etcd cluster (e.g., with a new SRV discovery domain).
- Once stable, we will enable etcd-mirror on one of the new-cluster nodes and begin incrementally migrating etcd clients (e.g., by site / use case).
- Once all etcd clients are migrated, we can decom the old cluster (procedure elided here for brevity).
Although somewhat involved, I don't think this is significantly more complex than if we had chosen bookworm as in interim target and later reimaged in-place to trixie. While choosing bookworm would allow us to integrate into the existing etcd cluster, rather than standing up a new one in parallel, I'm not sure that actually makes our jobs easier / safer - e.g., I prefer having the ability to kick the tires on the new nodes first and incrementally move clients.
What does this imply for our plans in eqiad?
In eqiad, there's no hardware refresh until later in the next FY, so our vehicle for getting off bullseye must be an in-place reimage. Regardless of how we go about this, we'll start by switching the primary etcd cluster to codfw, as well as shunting all client traffic there (details).
One option would be to do something like the procedure above: once all client traffic is flowing to codfw, we would downtime / stop etcd on conf100[7-9], then incrementally reimage to the zookeeper-only configcluster role variant on trixie (IIUC, there's no "integration" step here - i.e., they should just join). Once we've ship-of-Theseus'd the zookeeper cluster, assign the normal configcluster role and bootstrap a new etcd cluster (i.e., steps 3 and 4). This has the benefit doing something shaped like what we just recently did in codfw while it's fresh in our heads.
Another option would be to reimage in two steps - i.e., first reimage all nodes to bookworm, then to trixie. That would require preparing a forward-port of zookeeper to bookworm as well. I don't have a strong opinion here, and indeed this has the benefit of following a more "standard" procedure.
On net, I'm not sure whether moving first to bookworm makes things faster / easier or not. Speed should be a deciding factor, though, since we're SPoF on a single etcd cluster (codfw) for the duration.
Other practicalities
- We'll need to start explicitly passing -enable-v2 if we're not already.