= Inventory of hosts to be upgraded to bullseye
== Hadoop-test
[] hadoop-workers-test - 3- `cumin 'P{F:lsbdistcodename = buster} and A:hadoop-worker-test'`
[] hadoop-coordinator-test - 1 `an-test-coord1001.eqiad.wmnet`
[] hadoop-master-test - 1 - `an-test-master1001.eqiad.wmnet`
[] hadoop-client-test - 1 - `an-test-client1001.eqiad.wmnet`
== Hadoop
[] hadoop-workers - 91 - `cumin 'P{F:lsbdistcodename = buster} and A:hadoop-worker'`
[] hadoop-coordinators - 2 - `cumin 'P{F:lsbdistcodename = buster} and A:hadoop-coordinator'` **n.b. Refresh an-coord100[1-2] with an-coord100[3-4]**
[] hadoop-masters - 1 `an-master1001`
[] hadoop-standby - 1 `an-master1002`
== Stats clients
[] stat servers - 5 `cumin 'P{F:lsbdistcodename = buster} and A:stat'` **n.b. stat1004 to be decommed. stat1009 and stat1010 to be put into service**
== Presto
[] presto-worker-test 1 - `an-test-presto1001.eqiad.wmnet`
[] presto-worker - 5 - `cumin 'P{F:lsbdistcodename = buster} and A:presto-analytics'` **n.b. 10 new an-presto10[06-15] hosts already on bullseye**
== Druid
[] druid-test - 1- `an-test-druid1001.eqiad.wmnet`
[] druid-analytics - 5 - `cumin 'P{F:lsbdistcodename = buster} and A:druid-analytics'`
[] druid-public - 5 - `cumin 'P{F:lsbdistcodename = buster} and A:druid-public'` **n.b. Refresh druid100[4-6] with druid10[09-11]**
== Kafka
[] kafka-jumbo - 9 - `cumin 'P{F:lsbdistcodename = buster} and A:kafka-jumbo'` **n.b. Refresh kafka-jumbo100[1-6] with kafka-jumbo10[09-15]**
== Airflow
[] airflow - 3 - `cumin 'P{F:lsbdistcodename = buster} and A:analytics-airflow'`
== AQS
[] aqs - 24 - `cumin 'P{F:lsbdistcodename = buster} and A:aqs'` **Should data persistence be taking on this work?**
== Zookeeper
[] zookeeper-analytics - 3 - `cumin 'P{F:lsbdistcodename = buster} and A:zookeeper-analytics'`
== Event schemas
[] schema - 4 - `cumin 'P{F:lsbdistcodename = buster} and A:schema'`
== Misc
[] eventlogging - 1 - `eventlog1003.eqiad.wmnet`
[] archiva - 1 - `archiva1001.wikimedia.org` **n.b. {T317182} at the same time**
[] matomo - 1 - `matomo1001.eqiad.wmnet` **should we deprecate and decommission?**
[] web publishing - 1 - `an-web1001.eqiad.wmnet`
[] hue - 1 - `an-tool1009.eqiad.wmnet` **should we deprecate and decommission?**
[] yarn - 1 - `an-tool1008.eqiad.wmnet`
----
= Original description below
Recent updates are **written in bold text**
During the migration to Buster we worked on two things that should reduce a lot the pain of upgrading:
1) Partman partition re-use recipes for Debian installs of most of our hosts. This means that it will be way easier to reimage/reinstall every node of the cluster without stressing too much about backing up data first etc..
2) Fixed uid/gid of most of the system users. This will allow us to avoid weird permission errors/mismatches after reinstall/reimage.
It is nonetheless a sizeable amount of work :)
Some high level notes:
* Moving the Hadoop test cluster to Bullseye ahead of time may be a good way to see if anything weird comes up.
* A lot of VMs like matomo1002, archiva1002, eventlog1003, an-tool100*, etc.. should be easy to migrate. The work to do is to create a new VM with Bullseye running the same packages, test that everything is fine and flip the traffic over. **There is a `sre.ganeti.reimage` cookbook, making a reimage in place an even easier option in many cases**
* Most of our systems like Hadoop, Druid, etc.. are not ready for Java 11, so we'll need to use 8. **We now have full support for Java 8 in bullseye, so we are good to go**
* Moving Hadoop to Bullseye poses some further questions, since on paper the current version of Bigtop that we run (1.5) doesn't support Bullseye **We have now built bigtop 1.5 for bullseye and deployed it so we are good to go**