Inventory of hosts to be upgraded to bullseye
Hadoop-test
Hadoop
- T332570: Upgrade hadoop workers to bullseye
- T332572: Refresh hadoop coordinators an-coord100[1-2] with an-coord[3-4]
- T332573: Refresh an-master100[1-2] with an-master100[3-4]
- T332578: Refresh an-master1002 with an-master1004
Stats clients
Launcher
Presto
Druid
- T332584: Upgrade an-test-druid1001 to bullseye
- T332604: Upgrade the druid-analytics cluster to bullseye
- T332589: Upgrade the druid-public cluster to bullseye n.b. Refresh druid100[4-6] with druid10[09-11]
Kafka
- kafka-jumbo - 9 - cumin 'P{F:lsbdistcodename = buster} and A:kafka-jumbo' n.b. Refresh kafka-jumbo100[1-6] with kafka-jumbo10[09-15] - T348495: Upgrade kafka-jumbo100[7-9] to Debian Bullseye
Airflow
- airflow - 5 - cumin 'P{F:lsbdistcodename = buster} and A:analytics-airflow'
AQS
- aqs - 24 - cumin 'P{F:lsbdistcodename = buster} and A:aqs' (Data-Persistence, see: T347738)
Zookeeper
Event schemas
- schema - 4 - cumin 'P{F:lsbdistcodename = buster} and A:schema'
Misc
- eventlogging - 1 - eventlog1003.eqiad.wmnet
- archiva - 1 - archiva1001.wikimedia.org Archiva is to be decommissioned
- matomo - 1 - matomo1001.eqiad.wmnet T349397: Migrate the matomo host to bookworm
- web publishing - 1 - an-web1001.eqiad.wmnet
-
hue - 1 - an-tool1009.eqiad.wmnetdecommissioned - yarn - 1 - an-tool1008.eqiad.wmnet
Original description below
Recent updates are written in bold text
During the migration to Buster we worked on two things that should reduce a lot the pain of upgrading:
- Partman partition re-use recipes for Debian installs of most of our hosts. This means that it will be way easier to reimage/reinstall every node of the cluster without stressing too much about backing up data first etc..
- Fixed uid/gid of most of the system users. This will allow us to avoid weird permission errors/mismatches after reinstall/reimage.
It is nonetheless a sizeable amount of work :)
Some high level notes:
- Moving the Hadoop test cluster to Bullseye ahead of time may be a good way to see if anything weird comes up.
- A lot of VMs like matomo1002, archiva1002, eventlog1003, an-tool100*, etc.. should be easy to migrate. The work to do is to create a new VM with Bullseye running the same packages, test that everything is fine and flip the traffic over. There is a sre.ganeti.reimage cookbook, making a reimage in place an even easier option in many cases
- Most of our systems like Hadoop, Druid, etc.. are not ready for Java 11, so we'll need to use 8. We now have full support for Java 8 in bullseye, so we are good to go
- Moving Hadoop to Bullseye poses some further questions, since on paper the current version of Bigtop that we run (1.5) doesn't support Bullseye We have now built bigtop 1.5 for bullseye and deployed it so we are good to go