Page MenuHomePhabricator

Use --new.consumer for main codfw <-> eqiad Kafka MirrorMaker
Closed, ResolvedPublic8 Story Points

Description

As part of the Jumbo cluster upgrade, we've started using the --new.consumer option of the 0.9 MirrorMaker process. The new consumer was experimental when 0.9 was released, so it was not the default. Newer MirrorMaker versions have completely removed the old consumer client.

As part of the goal to upgrade Kafka main clusters to 1.x in Q4, we will need to switch to a new consumer client based MirrorMaker at some point. New consumer has been *much* more stable for us when replicating from main-eqiad -> jumbo, especially around consumer rebalances, etc. I've also added a new and better prometheus based dashboard and alerting for the new MirrorMaker instance.

However, the --new.consumer uses Kafka to store offsets instead of zookeeper, so I'm pretty sure this will reset all offsets for MirrorMaker. We need to figure out how best to deal with this for change-prop and job queue. I believe that change-prop does deduplication, so if possible the best thing to do would be to spawn up new consumer MirrorMaker instances in a new consumer group, with a very short overlap before shutting down the old consumer based ones. While both MirrorMaker instances are running, we'd get duplicate messages, but hopefully the # would be limited.

Event Timeline

Ottomata created this task.Mar 28 2018, 3:11 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 28 2018, 3:11 PM
Ottomata triaged this task as Normal priority.Mar 28 2018, 3:11 PM

The changeprop/jobqueue only listens to local events in a DC, so having duplicate eqiad events in codfw and duplicate codfw events in eqiad would not be a problem at all.

Ah! right great.

fdans moved this task from Incoming to Kafka Work on the Analytics board.Mar 29 2018, 4:56 PM
Ottomata moved this task from Next Up to In Progress on the Analytics-Kanban board.

Change 424344 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use profile::kafka::mirror with --new.consumer for main-codfw -> main-eqiad mirror

https://gerrit.wikimedia.org/r/424344

I'd like to try this early next week.

We had planned to pause this until after the main Kafka upgrade in T167039, but from https://kafka.apache.org/documentation/#upgrade_1_1_0:

Note that the older Scala consumer does not support the new message format introduced in 0.11, so to avoid the performance cost of down-conversion (or to take advantage of exactly once semantics), the newer Java consumer must be used.

The old MirrorMaker consumer is the only use of the old Scala consumer, so we actually do need to switch to --new.consumer before we upgrade main.

Mentioned in SAL (#wikimedia-operations) [2018-04-16T18:03:24Z] <ottomata> restarting main <-> main DC kafka mirror maker instances to blacklist job and cp topics T190940 T167039

Mentioned in SAL (#wikimedia-operations) [2018-04-16T18:28:04Z] <ottomata> temporarily stopping puppet on kafka200[123] to apply MirrorMaker --new.consumer https://gerrit.wikimedia.org/r/#/c/424344/ T190940

Change 424344 merged by Ottomata:
[operations/puppet@production] Use profile::kafka::mirror with --new.consumer for main-codfw -> main-eqiad

https://gerrit.wikimedia.org/r/424344

Change 426965 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use --new.consumer for main codfw -> eqiad MirrorMaker

https://gerrit.wikimedia.org/r/426965

Change 426965 merged by Ottomata:
[operations/puppet@production] Use --new.consumer for main codfw -> eqiad MirrorMaker

https://gerrit.wikimedia.org/r/426965

Change 426971 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Configure jmx_exporter prometheus config for kafka main (mirror)

https://gerrit.wikimedia.org/r/426971

Change 426971 merged by Ottomata:
[operations/puppet@production] Configure jmx_exporter prometheus config for kafka main (mirror)

https://gerrit.wikimedia.org/r/426971

Change 426973 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Remove unused role::kafka::main::mirror and set up main MM alerts

https://gerrit.wikimedia.org/r/426973

Change 426973 merged by Ottomata:
[operations/puppet@production] Remove unused role::kafka::main::mirror and set up main MM alerts

https://gerrit.wikimedia.org/r/426973

Ottomata moved this task from Paused to Done on the Analytics-Kanban board.Apr 16 2018, 7:26 PM
Ottomata set the point value for this task to 8.

Change 426976 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Fix main-eqiad_to_main-codfw Mirror alert

https://gerrit.wikimedia.org/r/426976

Change 426976 merged by Ottomata:
[operations/puppet@production] Fix main-eqiad_to_main-codfw Mirror alert

https://gerrit.wikimedia.org/r/426976

Change 427120 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Rename mirror::alerts source prometheus url paramater

https://gerrit.wikimedia.org/r/427120

Change 427120 merged by Ottomata:
[operations/puppet@production] Rename mirror::alerts source prometheus url paramater

https://gerrit.wikimedia.org/r/427120

Change 427134 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Reduce warning_threshold for main eqiad -> codfw MirrorMaker

https://gerrit.wikimedia.org/r/427134

Change 427134 merged by Ottomata:
[operations/puppet@production] Reduce warning_threshold for main eqiad -> codfw MirrorMaker

https://gerrit.wikimedia.org/r/427134

Change 427144 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Fix mirror alert parameter

https://gerrit.wikimedia.org/r/427144

Change 427144 merged by Ottomata:
[operations/puppet@production] Fix mirror alert parameter

https://gerrit.wikimedia.org/r/427144

Nuria closed this task as Resolved.Jun 25 2018, 11:14 PM