https://kafka.apache.org/documentation/#upgrade_1_1_0
This task is about upgrading Kafka main clusters to 1.x T193778 is about enabling SSL and inter broker encryption after the upgrade is complete.
# Prep Work
[x] Convert Kafka main clusters to use `profile::kafka::broker`
[x] Upgrade Kafka main clusters to Debian Strech and Java 8.
[x] Test upgrade plan in deployment-prep, ensure Kafka clients work there.
[x] On all brokers, set:
```
inter.broker.protocol.version=0.9.0.1
log.message.format.version=0.9.0.1
```
NOTE: 1.x version of MirrorMaker will not work when consuming from 0.9 cluster. DO NOT restart any main <-> main MirrorMaker instances until all brokers in both DCs have been upgraded.
# production upgrade plan [WIP]
This upgrade requires 3 rolling restarts of each broker in a Kafka cluster.
For the upgrade:
1. To upgrade the package software
2. To set `inter.broker.protocol.version=1.1.0`
3. To set `log.message.format.version` to the default (1.1.0) and enable SSL port
### main-codfw
#### upgrade
Stop puppet on all main-codfw brokers.
```
# On neodymium:
sudo cumin 'kafka200*' puppet agent --disable '$USER - Kafka upgrade'"
```
1. For each broker: upgrade and restart Kafka, still using `inter.broker.protocol.version=0.9.0.1`.
```
sudo service kafka stop
sudo apt-get remove confluent-kafka-2.11.7
sudo apt-get install confluent-kafka-2.11
# remove unwanted systemd units and directories:
sudo rm -rv /var/log/confluent /var/lib/kafka /var/lib/zookeeper /lib/systemd/system/confluent*.service && systemctl daemon-reload && systemctl reset-failed
sudo service kafka start
# wait until broker is back up and in ISRs, initiate election:
watch "kafka topics --describe --topic eqiad.mediawiki.revision-create | grep -E 'Isr:.*1001.*$'"
kafka preferred-replica-election
# Now proceed with next broker...
```
2. Merge https://gerrit.wikimedia.org/r/#/c/430449/. For each broker, run puppet to set `inter.broker.protocol.version=1.1.0` and restart Kafka.
```
sudo puppet agent --enable && sudo puppet agent -t
sudo service kafka restart
# wait until broker is back up and in ISRs, initiate election:
watch "kafka topics --describe --topic eqiad.mediawiki.revision-create | grep -E 'Isr:.*1001.*$'"
kafka preferred-replica-election
# Now proceed with next broker...
```
3. Merge https://gerrit.wikimedia.org/r/#/c/430450/. For each broker, run puppet to set default `log.message.format.version` and restart each broker:
```
sudo puppet agent -t
sudo service kafka restart
# wait until broker is back up and in ISRs, initiate election:
watch "kafka topics --describe --topic eqiad.mediawiki.revision-create | grep -E 'Isr:.*1001.*$'"
kafka preferred-replica-election
# Now proceed with next broker...
```
Broker upgrade is complete! Remove client specific api.version settings, they are no longer needed for eventbus and statsv.
4. Merge https://gerrit.wikimedia.org/r/#/c/430640/ and restart services:
```
# on eventbus (kafka main) hosts, rolling restart each eventbus service
sudo puppet agent -t
depool && sudo service eventlogging-service-eventbus restart && sleep 3 && pool
# on kafkamon2001
sudo puppet agent -t
sudo service burrow-main-codfw restart
```
### main-eqiad
TODO: Same as above, but with different gerrit patches for main-eqiad
## Post upgrade:
After both clusters are fully upgraded, we remove `api.version` setting for statsv:
Revert https://gerrit.wikimedia.org/r/#/c/429432/1/statsv.py and restart statsv.
Restart main MirrorMaker instances on Kafka 1.1.0. On each cluster and broker:
```
# Stop on on all brokers
sudo service kafka-mirror-main* stop
# Restart on all brokers
sudo service kafka-mirror-main* start
```