adjust fundraising firewalls and kafkatee configuration to accomodate new kafka brokers kafka-jumbo100[789]
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Jgreen
	Jun 2 2020, 3:31 PM

Description

Fundraising kafkatee collection broke when Analytics deployed kafka-jumbo100[789]. We'll need to adjust our configurations to accomodate the change, and investigate how much data was lost and whether it was important enough to backfill. Luckily this coincided with work we were doing in FR with campaigns down, so maybe it's a non-issue.

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T244211 Analytics Hardware for Fiscal Year 2019/2020
Resolved	Ottomata	T252675 Add new kafka brokers kafka-jumbo100[789] to the jumbo-eqiad Kafka cluster
Resolved	Jgreen	T254257 adjust fundraising firewalls and kafkatee configuration to accomodate new kafka brokers kafka-jumbo100[789]
		Restricted Task

Event Timeline

Jgreen removed Ottomata as the assignee of this task.Jun 2 2020, 3:31 PM

Jgreen triaged this task as High priority.

Jgreen created this task.

Jgreen edited projects, added fundraising-tech-ops; removed Analytics-Kanban, Analytics.

Jgreen moved this task from Triage to In Progress on the fundraising-tech-ops board.Jun 2 2020, 3:44 PM

Jgreen added a subtask: Restricted Task.

Kafkatee configuration change is deployed. The rest is in progress.

ayounsi closed subtask Restricted Task as Resolved.Jun 2 2020, 3:50 PM

done except for data recovery

@Jgreen should we add also monitors to prevent this from happening again?

For example, from grafana I see some spikes in kafka consumer lag for frack, but nothing really sustained for hours. What errors did you see on your side? If timeouts, I guess that it may have been due to kafkatee on your side trying to contact 1007-1009 acting as consumer group leaders but then failing, and possibly getting re-assigned to another broker?

In T254257#6187560, @elukey wrote:

@Jgreen should we add also monitors to prevent this from happening again?

We didn't have any major campaign activity when this happened, so what we noticed was in the firewall logs of dropped connection attempts to the three new hosts. I agree monitoring would be good, but for a predictable change like this advanced notice would be better.

Jgreen mentioned this in T254382: improve banner logger and kafkatee monitoring .Jun 3 2020, 5:28 PM

adjust fundraising firewalls and kafkatee configuration to accomodate new kafka brokers kafka-jumbo100[789]Closed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

adjust fundraising firewalls and kafkatee configuration to accomodate new kafka brokers kafka-jumbo100[789]
Closed, ResolvedPublic
Actions

Related Objects
Search...