Page MenuHomePhabricator

Upgrade kafka main clusters to 0.9
Closed, ResolvedPublic5 Estimated Story Points

Description

Just the work necessary to do the actual upgrade (plan & execute). codfw already done. (blocked on event bus work).

Related Objects

Mentioned In
rOPUPea2329c6c0d9: Remove now unused kafka module
rOPUP5d412d2e337e: Finalize upgrade of Kafka main-eqiad to Confluent Kafka 0.9
rOPUP236f30a76529: Prepare for upgrading Kafka main-eqiad to Confluent Kafka 0.9
rOPUPf3e97ee237db: Remove now unused kafka module
rOPUP9a2dc83eb621: Finalize upgrade of Kafka main-eqiad to Confluent Kafka 0.9
rOPUP9ae819e4cdac: Prepare for upgrading Kafka main-eqiad to Confluent Kafka 0.9
rOPUP408243f7b410: Finalize upgrade of Kafka main-eqiad to Confluent Kafka 0.9
rOPUPbd9d086c1e84: Prepare for upgrading Kafka main-eqiad to Confluent Kafka 0.9
rOPUP7f5396ec0c7b: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUPa4c99a022d92: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUP3c57558e6ea8: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUPfb95e441fab6: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUP0139eb3f4b53: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUP3430bde1f207: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUP9d18edc1e08d: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUP7a23f4475ac9: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUPc51dd9af7b65: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUP80cd11d55513: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUPd7cfa88b9631: Finalize main-codfw Kafka upgrade
rOPUPdc4b2b6971a9: Finalize main-codfw Kafka upgrade
rOPUPe329aecc77e8: Upgrade Kafka main-codfw to 0.9
rOPUP5ee562102732: Finalize main-codfw Kafka upgrade
rOPUP430f3fb1fba9: Upgrade Kafka main-codfw to 0.9
rOPUP74131f1b71fd: Stop kafka mirror maker on kafka100[12], it is not doing anything anyway
rOPUPfb2802045a34: Stop kafka mirror maker on kafka100[12], it is not doing anything anyway

Event Timeline

Milimetric triaged this task as Medium priority.Jul 7 2016, 5:32 PM
Milimetric moved this task from Incoming to Dashiki on the Analytics board.

Change 299149 had a related patch set uploaded (by Ottomata):
Stop kafka mirror maker on kafka100[12], it is not doing anything anyway

https://gerrit.wikimedia.org/r/299149

Change 299149 merged by Ottomata:
Stop kafka mirror maker on kafka100[12], it is not doing anything anyway

https://gerrit.wikimedia.org/r/299149

Ottomata renamed this task from Upgrade Kafka (non-analytics cluster) to Upgrade kafka main clusters.Jul 22 2016, 1:04 PM
Ottomata renamed this task from Upgrade kafka main clusters to Upgrade kafka main clusters to 0.9.Jul 25 2016, 1:46 PM
Ottomata claimed this task.
Ottomata edited projects, added Analytics-Kanban, Event-Platform; removed Analytics.
Ottomata updated the task description. (Show Details)
Ottomata updated the task description. (Show Details)

I'd like to do main-codfw this week. Will coordinate with services on this.

@Eevans let's sync up on IRC today about this.

Change 300896 had a related patch set uploaded (by Ottomata):
Finalize main-codfw Kafka upgrade

https://gerrit.wikimedia.org/r/300896

Change 300867 merged by Ottomata:
Upgrade Kafka main-codfw to 0.9

https://gerrit.wikimedia.org/r/300867

Change 300896 merged by Ottomata:
Finalize main-codfw Kafka upgrade

https://gerrit.wikimedia.org/r/300896

codfw has been upgraded to 0.9.

We found a bug in the version of kafka-python we are using for eventbus. To work around this for this deploy, before we upgrade in eqiad I'd like to merge and deploy https://gerrit.wikimedia.org/r/#/c/300944/. This will allow us to log events that did not produce properly, and reproduce them after the upgrade.

Milimetric set the point value for this task to 5.

Change 304029 had a related patch set uploaded (by Ottomata):
Add error_output to eventlogging service and make eventbus write EventErrors to log file

https://gerrit.wikimedia.org/r/304029

Change 304029 merged by Ottomata:
Add error_output to eventlogging service and make eventbus write EventErrors to log file

https://gerrit.wikimedia.org/r/304029

Change 304821 had a related patch set uploaded (by Ottomata):
Prepare for upgrading Kafka main-eqiad to Confluent Kafka 0.9

https://gerrit.wikimedia.org/r/304821

Change 304822 had a related patch set uploaded (by Ottomata):
Finalize upgrade of Kafka main-eqiad to Confluent Kafka 0.9

https://gerrit.wikimedia.org/r/304822

Change 304827 had a related patch set uploaded (by Ottomata):
Remove now unused kafka module

https://gerrit.wikimedia.org/r/304827

@Pchelolo we are ready to go for main-eqiad! Ping me when you are online so you can babysit change-prop.

Change 304821 merged by Ottomata:
Prepare for upgrading Kafka main-eqiad to Confluent Kafka 0.9

https://gerrit.wikimedia.org/r/304821

Change 304822 merged by Ottomata:
Finalize upgrade of Kafka main-eqiad to Confluent Kafka 0.9

https://gerrit.wikimedia.org/r/304822

Change 304827 merged by Ottomata:
Remove now unused kafka module

https://gerrit.wikimedia.org/r/304827

Done.

eventlogging-service-eventbus did drop a few events during broker restarts. GAHHHHH. We captured these events in a file and then replayed them manually after the upgrade was done.

I will try again to replicate this in labs.