Page MenuHomePhabricator

Develop a plan for integrating conf200[7-9]
Open, MediumPublic

Description

Background

conf200[7-9] are in the process of being racked and setup in codfw. Once ready, we need to integrate them in order to replace conf200[4-6].

Unfortunately, integrating them incrementally into the existing cluster(s) gets a bit complicated. Specifically:

  • conf200[4-6] (and configcluster nodes generally) are running debian bullseye, with etcd 3.3.25 and zookeeper 3.4.13.
  • Given their respective support horizons, we would like to target trixie (etcd 3.5.16, zookeeper 3.9.3) rather than bookworm (etcd 3.4.23, zookeeper 3.8.0).
  • For etcd, targeting trixie means running afoul of the version-compatibility rules, where only upgrades to successive minor versions are supported (link).
  • For zookeeper, targeting either trixie or bookworm means running afoul of similar minor-version compatibility rules (link).

Finally, it's worth noting that with the ongoing work to upgrade Kafka (T416669), we should be moving toward a world where it becomes possible to use KRaft instead of zookeeper. Thus, any major investment in zookeeper - e.g., executing a multi-step upgrade all the way through 3.9 - is not desirable.

Proposal

I'd propose we do something like the following:

  1. We will create a (temporary) configcluster role variant that only installs zookeeper. On trixie, that role will install a forward port of zookeeper 3.4.13 from bullseye (T418915#11851872).
  2. We will incrementally assign the new role to conf200[7-9] and integrate each host into the existing zookeeper cluster.
  3. We will assign the normal configcluster role to conf200[7-9] with hieradata overrides to bootstrap a new etcd cluster (e.g., with a new SRV discovery domain).
  4. Once stable, we will enable etcd-mirror on one of the new-cluster nodes and begin incrementally migrating etcd clients (e.g., by site / use case).
  5. Once all etcd clients are migrated, we can decom the old cluster (procedure elided here for brevity).

Although somewhat involved, I don't think this is significantly more complex than if we had chosen bookworm as in interim target and later reimaged in-place to trixie. While choosing bookworm would allow us to integrate into the existing etcd cluster, rather than standing up a new one in parallel, I'm not sure that actually makes our jobs easier / safer - e.g., I prefer having the ability to kick the tires on the new nodes first and incrementally move clients.

What does this imply for our plans in eqiad?

In eqiad, there's no hardware refresh until later in the next FY, so our vehicle for getting off bullseye must be an in-place reimage. Regardless of how we go about this, we'll start by switching the primary etcd cluster to codfw, as well as shunting all client traffic there (details).

One option would be to do something like the procedure above: once all client traffic is flowing to codfw, we would downtime / stop etcd on conf100[7-9], then incrementally reimage to the zookeeper-only configcluster role variant on trixie (IIUC, there's no "integration" step here - i.e., they should just join). Once we've ship-of-Theseus'd the zookeeper cluster, assign the normal configcluster role and bootstrap a new etcd cluster (i.e., steps 3 and 4). This has the benefit doing something shaped like what we just recently did in codfw while it's fresh in our heads.

Another option would be to reimage in two steps - i.e., first reimage all nodes to bookworm, then to trixie. That would require preparing a forward-port of zookeeper to bookworm as well. I don't have a strong opinion here, and indeed this has the benefit of following a more "standard" procedure.

On net, I'm not sure whether moving first to bookworm makes things faster / easier or not. Speed should be a deciding factor, though, since we're SPoF on a single etcd cluster (codfw) for the duration.

Other practicalities

  • We'll need to start explicitly passing -enable-v2 if we're not already.

Event Timeline

Change #1277085 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add component for forward port of Zookeeper 3.4

https://gerrit.wikimedia.org/r/1277085

Change #1277085 merged by Muehlenhoff:

[operations/puppet@production] Add component for forward port of Zookeeper 3.4

https://gerrit.wikimedia.org/r/1277085

Mentioned in SAL (#wikimedia-operations) [2026-04-24T14:00:06Z] <moritzm> imported zookeeper 3.4.13-6+deb11u1~wmf13u1 into component/zookeeper34 for trixie-wikimedia (forward port of Zookeeper 3.4 from Bullseye to Trixie) T424266

zookeeper 3.4.13-6+deb11u1~wmf13u1 can now be installed from component/zookeeper34 for trixie-wikimedia

Thank you very much, @MoritzMuehlenhoff. I'll test the new package early this week.

I was able to do some testing today with the new 3.4.13-6+deb11u1~wmf13u1 package (basic functionality, mixed bullseye / trixie cluster compatibility, etc.) and everything seems to work as expected.

One item I noticed while preparing my tests (container entrypoint emulating production) - it seems I overlooked the fact that we have a custom debian package providing a JMX-based prometheus exporter, which is installed via profile::prometheus::jmx_exporter and powers the Zookeeper grafana dashboard and alerts.

At present, we only have builds for bullseye published to apt.wikimedia.org, and will need to rebuild for trixie. I can take a look at that later this week, unless you have any concerns with going that route @MoritzMuehlenhoff.

I was able to do some testing today with the new 3.4.13-6+deb11u1~wmf13u1 package (basic functionality, mixed bullseye / trixie cluster compatibility, etc.) and everything seems to work as expected.

Excellent!

One item I noticed while preparing my tests (container entrypoint emulating production) - it seems I overlooked the fact that we have a custom debian package providing a JMX-based prometheus exporter, which is installed via profile::prometheus::jmx_exporter and powers the Zookeeper grafana dashboard and alerts.

At present, we only have builds for bullseye published to apt.wikimedia.org, and will need to rebuild for trixie. I can take a look at that later this week, unless you have any concerns with going that route @MoritzMuehlenhoff.

You don't need to import anything :-) The jmx exporter is already available for trixie and in fact already used on various roles (e.g. the IDPs and kafka-text). It's the same package as also used on bullseye and bookworm (which works since it only contains four JAR archives). At some point we should update it (that work is tracked at https://phabricator.wikimedia.org/T341439), but it hasn't been a priority so far.

[...]
You don't need to import anything :-) The jmx exporter is already available for trixie and in fact already used on various roles (e.g. the IDPs and kafka-text). It's the same package as also used on bullseye and bookworm (which works since it only contains four JAR archives). At some point we should update it (that work is tracked at https://phabricator.wikimedia.org/T341439), but it hasn't been a priority so far.

Ah, thank you! I completely missed that this was already in use on non-bullseye hosts. Alright, then I think we're all set in terms of package dependencies.

There may be some additional trixie-specific changes I need to make to the puppet resources (e.g., I see some classpath tweaks for bookworm, though I suspect those are an artifact of the bookworm-and-later packages themselves, which we are not using), but I'll be able to sort that out as we go.