Page MenuHomePhabricator

Upgrade Kafka Burrow to 1.0
Closed, ResolvedPublic8 Story Points

Description

In T180442 we added Kafka Burrow to kafkamon1001/2001, together with some prometheus burrow exporters to get lag metrics about Kafka consumer groups:

https://grafana.wikimedia.org/dashboard/db/kafka-consumer-lag

We are currently running a old 0.1 version, so it would be great to upgrade to 1.0, but until the prometheus burrow exporter doesn't fully support the new Burrow /v3 api (https://github.com/jirwin/burrow_exporter/issues/8) we are blocked.

I've done some work on packaging 1.0 on boron, these are my notes to avoid forgetting them:

  • I managed to build 1.0 updating the golang deps in the Burrow debian directory after pulling all src/ dirs via go get github.com/linkedin/Burrow and copying them under debian/godeps/etc../src and issue a build. Everything should be on boron in my home directory.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 2 2018, 11:36 AM

Configuration that works for Jumbo:

[general]
pidfile="burrow-eqiad.pid"
client-id="burrow-client"
#stdout-logfile="/var/log/burrow/burrow.out"

[logging]
filenamame="/var/log/burrow/burrow.log"
level="info"

[zookeeper]
servers=["conf1001.eqiad.wmnet:2181","conf1002.eqiad.wmnet:2181", "conf1003.eqiad.wmnet:2181"]
timeout=6
root-path="/burrow"

[client-profile.kafka09]
kafka-version="0.9.0.1"

[client-profile.kafka0102]
kafka-version="0.10.2"

[cluster.jumbo-eqiad]
class-name="kafka"
client-profile="kafka0102"
servers=[ "kafka-jumbo1001.eqiad.wmnet:9092", "kafka-jumbo1002.eqiad.wmnet:9092", "kafka-jumbo1003.eqiad.wmnet:9092", "kafka-jumbo1004.eqiad.wmnet:9092", "kafka-jumbo1005.eqiad.wmnet:9092", "kafka-jumbo1006.eqiad.wmnet:9092" ]

[consumer.jumbo-eqiad]
class-name="kafka"
cluster="jumbo-eqiad"
servers=[ "kafka-jumbo1001.eqiad.wmnet:9092", "kafka-jumbo1002.eqiad.wmnet:9092", "kafka-jumbo1003.eqiad.wmnet:9092", "kafka-jumbo1004.eqiad.wmnet:9092", "kafka-jumbo1005.eqiad.wmnet:9092", "kafka-jumbo1006.eqiad.wmnet:9092" ]
group-blacklist="^(console-consumer-|python-kafka-consumer-|test_).*$"

[httpserver.mylistener]
address=":6667"
timeout=300

It seems that Burrow 1.0 can manage multiple clusters at the same time, and that it wants only a -config-dir parameter (in which it looks for the burrow.toml config file) rather than exposing a -config-file like 0.1 does.

Tried to build burrow 1.0 using all debian dependencies (and not godeps added to the package) but this is what I get:

# github.com/linkedin/Burrow/core/internal/httpserver
src/github.com/linkedin/Burrow/core/internal/httpserver/coordinator.go:121:21: cannot use defaultHandler literal (type *defaultHandler) as type http.HandlerFunc in assignment
github.com/pborman/uuid
github.com/linkedin/Burrow/core/internal/notifier
github.com/linkedin/Burrow/core/internal/zookeeper
gopkg.in/natefinch/lumberjack.v2
github.com/linkedin/Burrow/core/internal
dh_auto_build: cd obj-x86_64-linux-gnu && go install -gcflags=\"-trimpath=/build/burrow-1.0.0/obj-x86_64-linux-gnu/src\" -asmflags=\"-trimpath=/build/burrow-1.0.0/obj-x86_64-linux-gnu/src\" -v -p 1 github.com/linkedin/Burrow github.com/linkedin/Burrow/core github.com/linkedin/Burrow/core/internal github.com/linkedin/Burrow/core/internal/cluster github.com/linkedin/Burrow/core/internal/consumer github.com/linkedin/Burrow/core/internal/evaluator github.com/linkedin/Burrow/core/internal/helpers github.com/linkedin/Burrow/core/internal/httpserver github.com/linkedin/Burrow/core/internal/notifier github.com/linkedin/Burrow/core/internal/storage github.com/linkedin/Burrow/core/internal/zookeeper github.com/linkedin/Burrow/core/protocol returned exit code 2
make: *** [debian/rules:7: build] Error 2
dpkg-buildpackage: error: debian/rules build subprocess returned exit status 2

Ok mistery solved after checking with Andrew. The version of https://github.com/julienschmidt/httprouter in Debian is stuck at the 1.1 tag (from 2015), and since then a ton of things changed. https://github.com/julienschmidt/httprouter/issues/207 is open since last year to ask for a 1.2 release (that would probably kick off a new Debian pkg release) but no traction since then, so I think that we'd probably need to backtrack into packing all the Burrow dependencies in our package rather than relying on Debian upstream :(

Mentioned in SAL (#wikimedia-operations) [2018-04-06T08:07:28Z] <elukey> upload prometheus-burrow-exporter 0.0.5 to jessie/stretch-wikimedia - T188719

Mentioned in SAL (#wikimedia-operations) [2018-04-06T08:07:50Z] <elukey> upgrade prometheus-burrow-exporter on kafkamon1001/2001 - T188719

Change 424557 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] burrow: configuration upgrade to support 1.0

https://gerrit.wikimedia.org/r/424557

Change 424615 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/debs/burrow@debian] Release Burrow 1.0

https://gerrit.wikimedia.org/r/424615

Change 424615 merged by Elukey:
[operations/debs/burrow@debian] Release Burrow 1.0

https://gerrit.wikimedia.org/r/424615

Change 424557 merged by Elukey:
[operations/puppet@production] burrow: configuration upgrade to support 1.0

https://gerrit.wikimedia.org/r/424557

Change 424998 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] burrow: fix creation of pid file under /var/run

https://gerrit.wikimedia.org/r/424998

Change 424998 merged by Elukey:
[operations/puppet@production] burrow: fix creation of pid file under /var/run

https://gerrit.wikimedia.org/r/424998

Change 424999 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] burrow: fix erb template generation

https://gerrit.wikimedia.org/r/424999

Change 424999 merged by Elukey:
[operations/puppet@production] burrow: fix erb template generation

https://gerrit.wikimedia.org/r/424999

Mentioned in SAL (#wikimedia-operations) [2018-04-09T07:09:33Z] <elukey> upgrade burrow to 1.0 on kafkamon[12]* - T188719

elukey set the point value for this task to 13.
elukey moved this task from Next Up to Done on the Analytics-Kanban board.
elukey changed the point value for this task from 13 to 8.
fdans closed this task as Resolved.Apr 12 2018, 5:26 PM