Page MenuHomePhabricator

Investigate transport errors in eqsin
Closed, ResolvedPublic

Description

After deploying haproxykafka in eqsin (same configuration and code as uslfo) we have a

Nov 12 11:01:39 cp5017 haproxykafka[3525916]: {"level":"error","error":"Cannot get cluster information: Local: Broker transport failure","time":"2024-11-12T11:01:39Z","message":"Cannot spawn Kakfa publisher"}
Nov 12 11:01:39 cp5017 haproxykafka[3525916]: {"level":"error","error":"Cannot get cluster information: Local: Broker transport failure","time":"2024-11-12T11:01:39Z","message":"Error running cmd"}

We're investigating about this, temporary deploying haproxykafka only on cp5017.eqsin.wmnet

Event Timeline

Issue was related to a too strict timeout settings on haproxykafka while checking for cluster metadata.
This has been fixed with https://gitlab.wikimedia.org/repos/sre/haproxykafka/-/commit/7aa51e85135c4e2d4b17bfaf23df51ded04ed539