Page MenuHomePhabricator

Move EventLogging analytics processes to Kafka jumbo-eqiad cluster
Closed, ResolvedPublic13 Story Points

Description

Porting the analytics EventLogging processes running on eventlog1001 from analytics Kafka to jumbo-eqiad Kafka should be relatively simple, so let's do it soon.

Steps

Preparation:

  • Create eventlogging-client-side and eventlogging-valid-mixed topics in jumbo-eqiad Kafka.
  • add ACLs for User:CN=varnishkafka to write to eventlogging-client-side via TLS, deny ANONYMOUS
  • Configure 2nd EventLogging Camus job to consume from jumbo-eqiad Kafka: https://gerrit.wikimedia.org/r/#/c/417321/
  • refactor role::cache::kafka::eventlogging into profile

Do it:

Clean up:

Event Timeline

Ottomata triaged this task as Normal priority.Dec 19 2017, 9:47 PM
Ottomata created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 19 2017, 9:47 PM

Change 403067 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] [WIP] Refactor cache::kafka::eventlogging into profile and enable TLS

https://gerrit.wikimedia.org/r/403067

Change 404773 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] [WIP] point eventlogging processes at Kafka jumbo

https://gerrit.wikimedia.org/r/404773

Ottomata moved this task from Wikistats Production to Kafka Work on the Analytics board.
Ottomata updated the task description. (Show Details)Mar 8 2018, 4:18 PM
Ottomata updated the task description. (Show Details)
Ottomata updated the task description. (Show Details)Mar 8 2018, 4:29 PM

Change 403067 merged by Ottomata:
[operations/puppet@production] Refactor cache::kafka::eventlogging into profile

https://gerrit.wikimedia.org/r/403067

Change 417319 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Point eventlogging varnishkafka at Kafka jumbo-eqiad with TLS

https://gerrit.wikimedia.org/r/417319

Change 417321 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Set up temporary secondary EventLogging camus-analytics job

https://gerrit.wikimedia.org/r/417321

Change 417321 merged by Ottomata:
[operations/puppet@production] Set up temporary secondary EventLogging camus-analytics job

https://gerrit.wikimedia.org/r/417321

Ottomata updated the task description. (Show Details)Mar 8 2018, 6:45 PM

@Krinkle @Imarlier, we were going to do this today, but thought we should wait until your kafka coal change is merged. When do you think it'll be ready and merged?

Question number 2 is also how do you guys prefer to coordinate for the switch to Jumbo. Does Kafka coal need to just flip to the new brokers or does it need a different procedure?

@Ottomata - should be today, in testing over the weekend I found an issue, think it's fixed but I need to verify. Since the new code knows how to catch up, I think you can go ahead and turn off the crossloader, and when the new code starts running we'll start processing from where we left off.

@elukey - coal will be getting it's kafka config from puppet, and it knows how to catch up. So, once event logging is being written to the new cluster, all that needs to happen is to change the kafka config stanza in puppet to point to the new location. (We'll need to do that with at least one other utility as well.)

Actually, here's a question: are kafka configs (broker addresses, anyway) in etcd? I'm not going to worry about it for this change, but going forward it would be really nice to not have to deal with puppet changes in order to alter application config.

it knows how to catch up

Sooo, this is a funky moment. We are doing something we will probably never do again: 100% replacing an old cluster.

Kafka jumbo is a brand new cluster, so offsets and consumer groups will be totally new. If/when we switched your process from Kafka analytics to jumbo, it will not have any committed offsets in jumbo, and so will have to start from the end (or beginning). In this case, it'd probably be best to reset from the beginning, because we will be switching the EventLogging producers atomically. I'm not sure what your (default?) value of auto.offset.reset is currently, but for this moment it should == "beginning".

are kafka configs (broker addresses, anyway) in etcd?

No, and they don't really need to be. The broker addresses exist only for bootstrapping the Kafka client on startup; technically brokers are discovered via Kafka meta data protocol anyway. (You could get away with giving only one of the broker hostnames for your client, and it will still find all of them.) I don't think we ever need to do a restart due to Kafka broker hostname changes again. Any future Kafka brokers nodes will be added to existing clusters, and as such will be auto discovered by your Kafka client as it is running. We'd update the list in Puppet, and Puppet would then update your configs, but you wouldn't need to restart anything to find the new broker.

Mentioned in SAL (#wikimedia-operations) [2018-03-14T14:44:56Z] <ottomata> beginning migration of eventlogging analtyics from Kafka analytics to Kafka jumbo: T183297

Mentioned in SAL (#wikimedia-analytics) [2018-03-14T14:45:01Z] <ottomata> beginning migration of eventlogging analtyics from Kafka analytics to Kafka jumbo: T183297

Change 417319 merged by Ottomata:
[operations/puppet@production] Point eventlogging varnishkafka at Kafka jumbo-eqiad with TLS

https://gerrit.wikimedia.org/r/417319

Change 404773 merged by Ottomata:
[operations/puppet@production] Point eventlogging analytics and webperf processes at Kafka jumbo

https://gerrit.wikimedia.org/r/404773

Ottomata updated the task description. (Show Details)Mar 14 2018, 3:07 PM

@Imarlier disregard my earlier comment about auto.offset.reset. Since we just switched over to jumbo-eqiad, you can just start consuming from there, and won't have to worry about offsets changing out from under you, since your new coal consumer will have never used the older analytics cluster.

Ottomata updated the task description. (Show Details)Mar 14 2018, 4:05 PM
Ottomata changed the point value for this task from 8 to 13.

Change 419482 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Burrow should monitor eventlogging groups from jumbo

https://gerrit.wikimedia.org/r/419482

Ottomata updated the task description. (Show Details)Mar 14 2018, 5:10 PM

Change 419482 merged by Elukey:
[operations/puppet@production] Burrow should monitor eventlogging groups from jumbo

https://gerrit.wikimedia.org/r/419482

Change 419498 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Remove eventlogging-analytics camus job

https://gerrit.wikimedia.org/r/419498

Change 419498 merged by Ottomata:
[operations/puppet@production] Remove eventlogging-analytics camus job

https://gerrit.wikimedia.org/r/419498

Ottomata updated the task description. (Show Details)Mar 14 2018, 6:32 PM

Change 420036 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Revert back to Kafka analytics cluster for eventlogging eventbus mysql consumer

https://gerrit.wikimedia.org/r/420036

Change 420036 merged by Ottomata:
[operations/puppet@production] Revert back to Kafka analytics for eventlogging eventbus mysql consumer

https://gerrit.wikimedia.org/r/420036

Ottomata updated the task description. (Show Details)Mar 28 2018, 3:03 PM
Nuria closed this task as Resolved.Mar 28 2018, 3:57 PM