I'd like to reduce the number of moving parts for the upcoming main Kafka cluster upgrade. It will be easier to manage this upgrade if we switch as many things as we can before the actual Kafka version upgrade.
This task is about upgrading to Debian Stretch and Java 8.
# Procedure
```
# On einsteinium
sudo icinga-downtime -d 3600 -r "prep for reimage" -h kafka2001
# On the host
sudo puppet agent --disable "$USER - reimage"
sudo depool
sudo service kafka stop
sudo service eventlogging-service-eventbus stop
# On neodymium
sudo -i wmf-auto-reimage -p T192832 kafka2001.codfw.wmnet
...
```
Log into host mgmt interface console com2, and wait for installer prompt to do manual partitioning.
/ should be ext4 50GB RAID10 across sd[abcd]1. And /srv should be left alone.
The first puppet will likely fail. We need to re-mount /srv and chown /srv/kafka files.
```
# On the host
sudo puppet agent --disable "$USER - /srv fix step"
# Puppet will have ensured files and directories in unmounted /srv directory, we can delete these.
sudo rm -rf /srv/*
# Put /srv in fstab
sudo blkid | grep md1 | awk '{print $2" "$1}' | sed -e 's/[:"]//g' | while read uuid partition; do
letter=$(echo $partition| awk -F 'sd|1' '{print $2}');
echo -e "$uuid\t/srv\text4\tdefaults,noatime,data=writeback,nobh,delalloc\t0\t2";
done >> /etc/fstab
# mount md1 as srv
sudo mount /srv
# remove possibly poorly chowned log files from /srv/log/eventlogging
sudo rm /srv/log/eventlogging/*.log
# Puppet usually ensures that kafka user is created, but puppet hasn't run successfully yet.
# Create the user manually so we can chown /srv/kafka to the new kafka uid.
sudo adduser --system --home /nonexistent --shell /bin/false --no-create-home --gecos 'Apache Kafka' --group kafka
# Make sure files are owned by kafka uid.
ls -ld /srv/kafka/data
# If this is owned by 'kafka:kafka', then the user added above was given the same uid
# it had before the reinstall. You can skip the next step.
# If /srv/kafka/data is owned by a numeric uid, then you need to run:
sudo chown -R kafka:kafka /srv/kafka/*
```
Run puppet, and make sure Kafka and eventbus come back online fine. Once all is settled,
```
sudo pool
```
(once all nodes have been reimaged, revert https://gerrit.wikimedia.org/r/#/c/429218/)