Page MenuHomePhabricator

adjust fundraising firewalls and kafkatee configuration to accomodate new kafka brokers kafka-jumbo100[789]
Closed, ResolvedPublic

Description

Fundraising kafkatee collection broke when Analytics deployed kafka-jumbo100[789]. We'll need to adjust our configurations to accomodate the change, and investigate how much data was lost and whether it was important enough to backfill. Luckily this coincided with work we were doing in FR with campaigns down, so maybe it's a non-issue.

Event Timeline

Jgreen triaged this task as High priority.
Jgreen created this task.
Jgreen edited projects, added fundraising-tech-ops; removed Analytics-Kanban, Analytics.
Jgreen added a subtask: Restricted Task.

Kafkatee configuration change is deployed. The rest is in progress.

ayounsi closed subtask Restricted Task as Resolved.Jun 2 2020, 3:50 PM
Jgreen claimed this task.
Jgreen moved this task from In Progress to Done on the fundraising-tech-ops board.

done except for data recovery

@Jgreen should we add also monitors to prevent this from happening again?

For example, from grafana I see some spikes in kafka consumer lag for frack, but nothing really sustained for hours. What errors did you see on your side? If timeouts, I guess that it may have been due to kafkatee on your side trying to contact 1007-1009 acting as consumer group leaders but then failing, and possibly getting re-assigned to another broker?

@Jgreen should we add also monitors to prevent this from happening again?

We didn't have any major campaign activity when this happened, so what we noticed was in the firewall logs of dropped connection attempts to the three new hosts. I agree monitoring would be good, but for a predictable change like this advanced notice would be better.