Page MenuHomePhabricator

Upgrade kafka main clusters to 0.9
Closed, ResolvedPublic5 Story Points

Description

Just the work necessary to do the actual upgrade (plan & execute). codfw already done. (blocked on event bus work).

Related Objects

Mentioned In
rOPUPea2329c6c0d9: Remove now unused kafka module
rOPUP5d412d2e337e: Finalize upgrade of Kafka main-eqiad to Confluent Kafka 0.9
rOPUP236f30a76529: Prepare for upgrading Kafka main-eqiad to Confluent Kafka 0.9
rOPUPf3e97ee237db: Remove now unused kafka module
rOPUP9a2dc83eb621: Finalize upgrade of Kafka main-eqiad to Confluent Kafka 0.9
rOPUP9ae819e4cdac: Prepare for upgrading Kafka main-eqiad to Confluent Kafka 0.9
rOPUP408243f7b410: Finalize upgrade of Kafka main-eqiad to Confluent Kafka 0.9
rOPUPbd9d086c1e84: Prepare for upgrading Kafka main-eqiad to Confluent Kafka 0.9
rOPUP7f5396ec0c7b: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUPa4c99a022d92: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUP3c57558e6ea8: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUPfb95e441fab6: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUP0139eb3f4b53: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUP3430bde1f207: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUP9d18edc1e08d: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUP7a23f4475ac9: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUPc51dd9af7b65: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUP80cd11d55513: Add error_output to eventlogging service and make eventbus write EventErrors…
rOPUPd7cfa88b9631: Finalize main-codfw Kafka upgrade
rOPUPdc4b2b6971a9: Finalize main-codfw Kafka upgrade
rOPUPe329aecc77e8: Upgrade Kafka main-codfw to 0.9
rOPUP5ee562102732: Finalize main-codfw Kafka upgrade
rOPUP430f3fb1fba9: Upgrade Kafka main-codfw to 0.9
rOPUP74131f1b71fd: Stop kafka mirror maker on kafka100[12], it is not doing anything anyway
rOPUPfb2802045a34: Stop kafka mirror maker on kafka100[12], it is not doing anything anyway

Event Timeline

Nuria created this task.Jun 20 2016, 8:55 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJun 20 2016, 8:55 PM
Milimetric triaged this task as Normal priority.Jul 7 2016, 5:32 PM
Milimetric moved this task from Incoming to Dashiki on the Analytics board.

Change 299149 had a related patch set uploaded (by Ottomata):
Stop kafka mirror maker on kafka100[12], it is not doing anything anyway

https://gerrit.wikimedia.org/r/299149

Change 299149 merged by Ottomata:
Stop kafka mirror maker on kafka100[12], it is not doing anything anyway

https://gerrit.wikimedia.org/r/299149

Ottomata renamed this task from Upgrade Kafka (non-analytics cluster) to Upgrade kafka main clusters.Jul 22 2016, 1:04 PM
Ottomata renamed this task from Upgrade kafka main clusters to Upgrade kafka main clusters to 0.9.Jul 25 2016, 1:46 PM
Ottomata claimed this task.
Ottomata edited projects, added Analytics-Kanban, Event-Platform; removed Analytics.
Ottomata updated the task description. (Show Details)
Ottomata updated the task description. (Show Details)

I'd like to do main-codfw this week. Will coordinate with services on this.

@Eevans let's sync up on IRC today about this.

Change 300896 had a related patch set uploaded (by Ottomata):
Finalize main-codfw Kafka upgrade

https://gerrit.wikimedia.org/r/300896

Change 300867 merged by Ottomata:
Upgrade Kafka main-codfw to 0.9

https://gerrit.wikimedia.org/r/300867

Change 300896 merged by Ottomata:
Finalize main-codfw Kafka upgrade

https://gerrit.wikimedia.org/r/300896

codfw has been upgraded to 0.9.

We found a bug in the version of kafka-python we are using for eventbus. To work around this for this deploy, before we upgrade in eqiad I'd like to merge and deploy https://gerrit.wikimedia.org/r/#/c/300944/. This will allow us to log events that did not produce properly, and reproduce them after the upgrade.

Ottomata moved this task from In Progress to Paused on the Analytics-Kanban board.Jul 28 2016, 4:18 PM
Milimetric updated the task description. (Show Details)Jul 28 2016, 4:57 PM
Milimetric set the point value for this task to 5.

Change 304029 had a related patch set uploaded (by Ottomata):
Add error_output to eventlogging service and make eventbus write EventErrors to log file

https://gerrit.wikimedia.org/r/304029

Change 304029 merged by Ottomata:
Add error_output to eventlogging service and make eventbus write EventErrors to log file

https://gerrit.wikimedia.org/r/304029

Ottomata moved this task from Paused to In Progress on the Analytics-Kanban board.Aug 15 2016, 2:06 PM

Change 304821 had a related patch set uploaded (by Ottomata):
Prepare for upgrading Kafka main-eqiad to Confluent Kafka 0.9

https://gerrit.wikimedia.org/r/304821

Change 304822 had a related patch set uploaded (by Ottomata):
Finalize upgrade of Kafka main-eqiad to Confluent Kafka 0.9

https://gerrit.wikimedia.org/r/304822

Change 304827 had a related patch set uploaded (by Ottomata):
Remove now unused kafka module

https://gerrit.wikimedia.org/r/304827

@Pchelolo we are ready to go for main-eqiad! Ping me when you are online so you can babysit change-prop.

mobrovac moved this task from Backlog to In Progress on the Event-Platform board.Aug 15 2016, 3:47 PM

Change 304821 merged by Ottomata:
Prepare for upgrading Kafka main-eqiad to Confluent Kafka 0.9

https://gerrit.wikimedia.org/r/304821

Change 304822 merged by Ottomata:
Finalize upgrade of Kafka main-eqiad to Confluent Kafka 0.9

https://gerrit.wikimedia.org/r/304822

Change 304827 merged by Ottomata:
Remove now unused kafka module

https://gerrit.wikimedia.org/r/304827

Ottomata moved this task from In Progress to Done on the Analytics-Kanban board.Aug 15 2016, 8:17 PM

Done.

eventlogging-service-eventbus did drop a few events during broker restarts. GAHHHHH. We captured these events in a file and then replayed them manually after the upgrade was done.

I will try again to replicate this in labs.

Nuria closed this task as Resolved.Aug 16 2016, 8:28 PM