Change Details

= Inventory of hosts to be upgraded to bullseye == Hadoop-test [] {T329363} == Hadoop [] {T332570} [] {T332572} [] {T332573} [] {T332578} == Stats clients [] {T329360} == Launcher [] {T332580} == Presto [] {T329361} == Druid [] {T332584} [] druid-analytics - 5 - `cumin 'P{F:lsbdistcodename = buster} and A:druid-analytics'` [] druid-public - 5 - `cumin 'P{F:lsbdistcodename = buster} and A:druid-public'` **n.b. Refresh druid100[4-6] with druid10[09-11]** == Kafka [] kafka-jumbo - 9 - `cumin 'P{F:lsbdistcodename = buster} and A:kafka-jumbo'` **n.b. Refresh kafka-jumbo100[1-6] with kafka-jumbo10[09-15]** == Airflow [] airflow - 3 - `cumin 'P{F:lsbdistcodename = buster} and A:analytics-airflow'` == AQS [] aqs - 24 - `cumin 'P{F:lsbdistcodename = buster} and A:aqs'` **Should data persistence be taking on this work?** == Zookeeper [] {T329362} == Event schemas [] schema - 4 - `cumin 'P{F:lsbdistcodename = buster} and A:schema'` == Misc [] eventlogging - 1 - `eventlog1003.eqiad.wmnet` [] archiva - 1 - `archiva1001.wikimedia.org` **n.b. {T317182} at the same time** [] matomo - 1 - `matomo1001.eqiad.wmnet` **should we deprecate and decommission?** [] web publishing - 1 - `an-web1001.eqiad.wmnet` [] hue - 1 - `an-tool1009.eqiad.wmnet` **should we deprecate and decommission?** [] yarn - 1 - `an-tool1008.eqiad.wmnet` ---- = Original description below Recent updates are **written in bold text** During the migration to Buster we worked on two things that should reduce a lot the pain of upgrading: 1) Partman partition re-use recipes for Debian installs of most of our hosts. This means that it will be way easier to reimage/reinstall every node of the cluster without stressing too much about backing up data first etc.. 2) Fixed uid/gid of most of the system users. This will allow us to avoid weird permission errors/mismatches after reinstall/reimage. It is nonetheless a sizeable amount of work :) Some high level notes: * Moving the Hadoop test cluster to Bullseye ahead of time may be a good way to see if anything weird comes up. * A lot of VMs like matomo1002, archiva1002, eventlog1003, an-tool100*, etc.. should be easy to migrate. The work to do is to create a new VM with Bullseye running the same packages, test that everything is fine and flip the traffic over. **There is a `sre.ganeti.reimage` cookbook, making a reimage in place an even easier option in many cases** * Most of our systems like Hadoop, Druid, etc.. are not ready for Java 11, so we'll need to use 8. **We now have full support for Java 8 in bullseye, so we are good to go** * Moving Hadoop to Bullseye poses some further questions, since on paper the current version of Bigtop that we run (1.5) doesn't support Bullseye **We have now built bigtop 1.5 for bullseye and deployed it so we are good to go**

= Inventory of hosts to be upgraded to bullseye == Hadoop-test [] {T329363} == Hadoop [] {T332570} [] {T332572} [] {T332573} [] {T332578} == Stats clients [] {T329360} == Launcher [] {T332580} == Presto [] {T329361} == Druid [] {T332584} [] druid-analytics - 5 - `cumin 'P{F:lsbdistcodename = buster} and A:druid-analytics'` [] {T332589} **n.b. Refresh druid100[4-6] with druid10[09-11]** == Kafka [] kafka-jumbo - 9 - `cumin 'P{F:lsbdistcodename = buster} and A:kafka-jumbo'` **n.b. Refresh kafka-jumbo100[1-6] with kafka-jumbo10[09-15]** == Airflow [] airflow - 3 - `cumin 'P{F:lsbdistcodename = buster} and A:analytics-airflow'` == AQS [] aqs - 24 - `cumin 'P{F:lsbdistcodename = buster} and A:aqs'` **Should data persistence be taking on this work?** == Zookeeper [] {T329362} == Event schemas [] schema - 4 - `cumin 'P{F:lsbdistcodename = buster} and A:schema'` == Misc [] eventlogging - 1 - `eventlog1003.eqiad.wmnet` [] archiva - 1 - `archiva1001.wikimedia.org` **n.b. {T317182} at the same time** [] matomo - 1 - `matomo1001.eqiad.wmnet` **should we deprecate and decommission?** [] web publishing - 1 - `an-web1001.eqiad.wmnet` [] hue - 1 - `an-tool1009.eqiad.wmnet` **should we deprecate and decommission?** [] yarn - 1 - `an-tool1008.eqiad.wmnet` ---- = Original description below Recent updates are **written in bold text** During the migration to Buster we worked on two things that should reduce a lot the pain of upgrading: 1) Partman partition re-use recipes for Debian installs of most of our hosts. This means that it will be way easier to reimage/reinstall every node of the cluster without stressing too much about backing up data first etc.. 2) Fixed uid/gid of most of the system users. This will allow us to avoid weird permission errors/mismatches after reinstall/reimage. It is nonetheless a sizeable amount of work :) Some high level notes: * Moving the Hadoop test cluster to Bullseye ahead of time may be a good way to see if anything weird comes up. * A lot of VMs like matomo1002, archiva1002, eventlog1003, an-tool100*, etc.. should be easy to migrate. The work to do is to create a new VM with Bullseye running the same packages, test that everything is fine and flip the traffic over. **There is a `sre.ganeti.reimage` cookbook, making a reimage in place an even easier option in many cases** * Most of our systems like Hadoop, Druid, etc.. are not ready for Java 11, so we'll need to use 8. **We now have full support for Java 8 in bullseye, so we are good to go** * Moving Hadoop to Bullseye poses some further questions, since on paper the current version of Bigtop that we run (1.5) doesn't support Bullseye **We have now built bigtop 1.5 for bullseye and deployed it so we are good to go**

= Inventory of hosts to be upgraded to bullseye == Hadoop-test [] {T329363} == Hadoop [] {T332570} [] {T332572} [] {T332573} [] {T332578} == Stats clients [] {T329360} == Launcher [] {T332580} == Presto [] {T329361} == Druid [] {T332584} [] druid-analytics - 5 - `cumin 'P{F:lsbdistcodename = buster} and A:druid-analytics'` [] druid-public - 5 - `cumin 'P{F:lsbdistcodename = buster} and A:druid-public'`{T332589} **n.b. Refresh druid100[4-6] with druid10[09-11]** == Kafka [] kafka-jumbo - 9 - `cumin 'P{F:lsbdistcodename = buster} and A:kafka-jumbo'` **n.b. Refresh kafka-jumbo100[1-6] with kafka-jumbo10[09-15]** == Airflow [] airflow - 3 - `cumin 'P{F:lsbdistcodename = buster} and A:analytics-airflow'` == AQS [] aqs - 24 - `cumin 'P{F:lsbdistcodename = buster} and A:aqs'` **Should data persistence be taking on this work?** == Zookeeper [] {T329362} == Event schemas [] schema - 4 - `cumin 'P{F:lsbdistcodename = buster} and A:schema'` == Misc [] eventlogging - 1 - `eventlog1003.eqiad.wmnet` [] archiva - 1 - `archiva1001.wikimedia.org` **n.b. {T317182} at the same time** [] matomo - 1 - `matomo1001.eqiad.wmnet` **should we deprecate and decommission?** [] web publishing - 1 - `an-web1001.eqiad.wmnet` [] hue - 1 - `an-tool1009.eqiad.wmnet` **should we deprecate and decommission?** [] yarn - 1 - `an-tool1008.eqiad.wmnet` ---- = Original description below Recent updates are **written in bold text** During the migration to Buster we worked on two things that should reduce a lot the pain of upgrading: 1) Partman partition re-use recipes for Debian installs of most of our hosts. This means that it will be way easier to reimage/reinstall every node of the cluster without stressing too much about backing up data first etc.. 2) Fixed uid/gid of most of the system users. This will allow us to avoid weird permission errors/mismatches after reinstall/reimage. It is nonetheless a sizeable amount of work :) Some high level notes: * Moving the Hadoop test cluster to Bullseye ahead of time may be a good way to see if anything weird comes up. * A lot of VMs like matomo1002, archiva1002, eventlog1003, an-tool100*, etc.. should be easy to migrate. The work to do is to create a new VM with Bullseye running the same packages, test that everything is fine and flip the traffic over. **There is a `sre.ganeti.reimage` cookbook, making a reimage in place an even easier option in many cases** * Most of our systems like Hadoop, Druid, etc.. are not ready for Java 11, so we'll need to use 8. **We now have full support for Java 8 in bullseye, so we are good to go** * Moving Hadoop to Bullseye poses some further questions, since on paper the current version of Bigtop that we run (1.5) doesn't support Bullseye **We have now built bigtop 1.5 for bullseye and deployed it so we are good to go**