Page MenuHomePhabricator

Build 0.8.2.1 Kafka package and upgrade Kafka brokers
Closed, ResolvedPublic

Description

This is now a blocker for our eventlogging on Kafka project.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

analytics1046-1049 are online as of today! I have started the decommission process of analytics1013,1014 and 1020. These nodes will become kafka brokers soon. Hopefully by Monday they will have their blocks replicated elsewhere.

Change 228826 had a related patch set uploaded (by Ottomata):
Preparing to reinstall and expand Kafka cluster on Jessie at Kafka 0.8.2.1

https://gerrit.wikimedia.org/r/228826

Change 228826 merged by Ottomata:
Preparing to reinstall and expand Kafka cluster on Jessie at Kafka 0.8.2.1

https://gerrit.wikimedia.org/r/228826

Change 228832 had a related patch set uploaded (by Ottomata):
Removing analytics1013,1014 and 1018 from hadoop worker list in site.pp

https://gerrit.wikimedia.org/r/228832

Change 228832 merged by Ottomata:
Removing analytics1013,1014 and 1018 from hadoop worker list in site.pp

https://gerrit.wikimedia.org/r/228832

Change 228847 had a related patch set uploaded (by Ottomata):
Provisioning analytics1013 as Kafka broker in analytics cluster

https://gerrit.wikimedia.org/r/228847

Change 228847 merged by Ottomata:
Provisioning analytics1013 as Kafka broker in analytics cluster

https://gerrit.wikimedia.org/r/228847

Change 228851 had a related patch set uploaded (by Ottomata):
Provision analytics1014 and analytics1020 as kafka brokers

https://gerrit.wikimedia.org/r/228851

Change 228851 merged by Ottomata:
Provision analytics1014 and analytics1020 as kafka brokers

https://gerrit.wikimedia.org/r/228851

Change 229012 had a related patch set uploaded (by Ottomata):
Remove newly provisioned kafka nodes from cluster

https://gerrit.wikimedia.org/r/229012

Change 229012 merged by Ottomata:
Remove newly provisioned kafka nodes from cluster

https://gerrit.wikimedia.org/r/229012

Change 229035 had a related patch set uploaded (by Ottomata):
Remove analytics1012 1013 1020 from list of kafka brokers in site.pp

https://gerrit.wikimedia.org/r/229035

Change 229035 merged by Ottomata:
Remove analytics1012 1013 1020 from list of kafka brokers in site.pp

https://gerrit.wikimedia.org/r/229035

Oof, had some problems yesterday :(

Incident documentation here:
https://wikitech.wikimedia.org/wiki/Incident_documentation/20150803-Kafka

Change 229193 had a related patch set uploaded (by Ottomata):
Updates and fixes for 0.8.2.1-2 release

https://gerrit.wikimedia.org/r/229193

Phew, ok, Joseph and I tested this migration again in labs:
https://etherpad.wikimedia.org/p/kafka_0.8.2.1_migration_labs

This all went smoothly. I've made a migration plan based on this process here:
https://etherpad.wikimedia.org/p/kafka_0.8.2.1_migration2

Alex is reviewing my latest packaging patch:
https://gerrit.wikimedia.org/r/#/c/229193/

Once we get that settled, I can rebuild and publish the .debs in our apt. I'll then test this package in labs one more time, and then I'll feel comfortable trying the upgrade again.

Change 229193 merged by Ottomata:
Updates and fixes for 0.8.2.1-2 release

https://gerrit.wikimedia.org/r/229193

Change 229961 had a related patch set uploaded (by Ottomata):
Rename analytics1013,1014,1020 to kafka1013,1014,1020

https://gerrit.wikimedia.org/r/229961

Change 229961 merged by Ottomata:
Rename analytics1013,1014,1020 to kafka1013,1014,1020

https://gerrit.wikimedia.org/r/229961

Change 230576 had a related patch set uploaded (by Ottomata):
Override kafka jmxtrans metrics to test new config for version 0.8.2.1

https://gerrit.wikimedia.org/r/230576

Change 230576 merged by Ottomata:
Override kafka jmxtrans metrics to test new config for version 0.8.2.1

https://gerrit.wikimedia.org/r/230576

Change 230577 had a related patch set uploaded (by Ottomata):
Better alias name for All topic metrics from kafka

https://gerrit.wikimedia.org/r/230577

Change 230577 merged by Ottomata:
Better alias name for All topic metrics from kafka

https://gerrit.wikimedia.org/r/230577

Ottomata renamed this task from Build new latest stable (0.8.2.1?) Kafka package and upgrade Kafka brokers to Build 0.8.2.1 Kafka package and upgrade Kafka brokers.Aug 12 2015, 2:13 PM
Ottomata added a comment.EditedAug 12 2015, 2:18 PM

Phew, after much difficulty, the 4 original Precise brokers are now running 0.8.2.1. There was a bug in the version of snappy that 0.8.2.1 needs that caused us much headache.

Immediate clean up TODOs:

  • Make an awesome grafana dashboard: http://grafana.wikimedia.org/#/dashboard/db/kafka
  • Clean up jmxtrans metrics, adapt kafka jmxtrans class, etc.
  • Audit Kafka alerts and make sure they are based on the correct metric names.
  • Write incident report and document data loss
  • Make sure Oozie jobs are running and normal (@JAllemandou is doing this).
  • Repackage Kafka for Precise, Trusty and Jessie with snappy 1.1.1.7 fix.

I won't be doing the next steps of this migration until the above is done.

Change 231021 had a related patch set uploaded (by Ottomata):
Update JMX metrics names for Kafka 0.8.2

https://gerrit.wikimedia.org/r/231021

Change 231021 merged by Ottomata:
Update JMX metrics names for Kafka 0.8.2

https://gerrit.wikimedia.org/r/231021

Change 231028 had a related patch set uploaded (by Ottomata):
Update alerts and jmx for Kafka 0.8.2

https://gerrit.wikimedia.org/r/231028

Change 231028 merged by Ottomata:
Update alerts and jmx for Kafka 0.8.2

https://gerrit.wikimedia.org/r/231028

Phew, after much difficulty, the 4 original Precise brokers are now running 0.8.2.1. There was a bug in the version of snappy that 0.8.2.1 needs that caused us much headache.
Immediate clean up TODOs:

  • Make an awesome grafana dashboard: http://grafana.wikimedia.org/#/dashboard/db/kafka
  • Clean up jmxtrans metrics, adapt kafka jmxtrans class, etc.
  • Audit Kafka alerts and make sure they are based on the correct metric names.
  • Write incident report and document data loss
  • Make sure Oozie jobs are running and normal (@JAllemandou is doing this).
  • Repackage Kafka for Precise, Trusty and Jessie with snappy 1.1.1.7 fix.

I won't be doing the next steps of this migration until the above is done.

My bit seems ok :)

Change 232097 had a related patch set uploaded (by Ottomata):
Don't use partman for analytics kafka jessie reinstall, do this part manually

https://gerrit.wikimedia.org/r/232097

Change 232098 had a related patch set uploaded (by Ottomata):
Rename analytics1012 to kafka1012

https://gerrit.wikimedia.org/r/232098

Change 232097 merged by Ottomata:
Don't use partman for analytics kafka jessie reinstall, do this part manually

https://gerrit.wikimedia.org/r/232097

Change 232136 had a related patch set uploaded (by Ottomata):
Rename analytics1012 to kafka1012, site.pp puppetization coming in separate commit

https://gerrit.wikimedia.org/r/232136

Change 232136 merged by Ottomata:
Rename analytics1012 to kafka1012, site.pp puppetization coming in separate commit

https://gerrit.wikimedia.org/r/232136

Change 232098 merged by Ottomata:
Rename analytics1012 to kafka1012

https://gerrit.wikimedia.org/r/232098

Change 232202 had a related patch set uploaded (by Ottomata):
Puppetize kafka1012 as kafka broker in analytics Kafka cluster

https://gerrit.wikimedia.org/r/232202

Change 232202 merged by Ottomata:
Puppetize kafka1012 as kafka broker in analytics Kafka cluster

https://gerrit.wikimedia.org/r/232202

Change 232203 had a related patch set uploaded (by Ottomata):
Use kafka1012 as hostname in Kafka cluster config

https://gerrit.wikimedia.org/r/232203

Change 232203 merged by Ottomata:
Use kafka1012 as hostname in Kafka cluster config

https://gerrit.wikimedia.org/r/232203

Change 232319 had a related patch set uploaded (by Ottomata):
Puppetize systemd override for Kafka LimitNOFILE

https://gerrit.wikimedia.org/r/232319

Change 232319 merged by Ottomata:
Puppetize systemd override for Kafka LimitNOFILE

https://gerrit.wikimedia.org/r/232319

Change 232534 had a related patch set uploaded (by Ottomata):
Rename analytics1022 to kafka1022

https://gerrit.wikimedia.org/r/232534

Change 232535 had a related patch set uploaded (by Ottomata):
Update camus property files with names of new brokers

https://gerrit.wikimedia.org/r/232535

Change 232535 merged by Ottomata:
Update camus property files with names of new brokers

https://gerrit.wikimedia.org/r/232535

Change 232542 had a related patch set uploaded (by Ottomata):
Rename analytics1022 -> kafka1022

https://gerrit.wikimedia.org/r/232542

Change 232542 merged by Ottomata:
Rename analytics1022 -> kafka1022

https://gerrit.wikimedia.org/r/232542

Change 232534 merged by Ottomata:
Rename analytics1022 to kafka1022

https://gerrit.wikimedia.org/r/232534

Change 232557 had a related patch set uploaded (by Ottomata):
Rename analytics1022 -> kafka1022

https://gerrit.wikimedia.org/r/232557

Change 232559 had a related patch set uploaded (by Ottomata):
Temporarily set expire of PTR for kafka1022 to 5 min so I can reinstall asap

https://gerrit.wikimedia.org/r/232559

Change 232559 merged by Ottomata:
Temporarily set expire of PTR for kafka1022 to 5 min so I can reinstall asap

https://gerrit.wikimedia.org/r/232559

Change 232560 had a related patch set uploaded (by Ottomata):
Return expire of kafka1022 PTR to 1H

https://gerrit.wikimedia.org/r/232560

Change 232560 merged by Ottomata:
Return expire of kafka1022 PTR to 1H

https://gerrit.wikimedia.org/r/232560

Change 232557 merged by Ottomata:
Rename analytics1022 -> kafka1022

https://gerrit.wikimedia.org/r/232557

Change 232769 had a related patch set uploaded (by Ottomata):
Rename analytics1018 -> kafka1018 in linux-host-entries

https://gerrit.wikimedia.org/r/232769

Change 232769 merged by Ottomata:
Rename analytics1018 -> kafka1018 in linux-host-entries

https://gerrit.wikimedia.org/r/232769

Change 232774 had a related patch set uploaded (by Ottomata):
Rename A record for analytics1018 -> kafka1018

https://gerrit.wikimedia.org/r/232774

Change 232774 merged by Ottomata:
Rename A record for analytics1018 -> kafka1018

https://gerrit.wikimedia.org/r/232774

Change 232776 had a related patch set uploaded (by Ottomata):
Repuppetize kafka1018 as a broker

https://gerrit.wikimedia.org/r/232776

Change 232776 merged by Ottomata:
Repuppetize kafka1018 as a broker

https://gerrit.wikimedia.org/r/232776

Change 234265 had a related patch set uploaded (by Ottomata):
Decom analytics1021 as a Kafka broker

https://gerrit.wikimedia.org/r/234265

Change 234265 merged by Ottomata:
Decom analytics1021 as a Kafka broker

https://gerrit.wikimedia.org/r/234265

Ottomata moved this task from In Progress to Done on the Analytics-Kanban board.Aug 28 2015, 3:34 PM
kevinator closed this task as Resolved.Aug 29 2015, 12:19 AM
kevinator added a subscriber: kevinator.