Hi folks!
I am moving kafka-main eqiad brokers to PKI, and today I've restarted kafka-main1002 and kafka-main1003's brokers. I noticed some purged alerts for some cp nodes, all of them with a similar output (this is from cp5032):
Apr 05 08:06:16 cp5032 purged[2085976]: %6|1680681976.892|FAIL|purged#consumer-1| [thrd:ssl://kafka-main1002.eqiad.wmnet:9093/bootstrap]: ssl://kafka-main1002.eqiad.wmnet:9093/1002: Disconnected (after 422825216ms in state UP) Apr 05 08:06:17 cp5032 purged[2085976]: %3|1680681977.129|FAIL|purged#consumer-1| [thrd:ssl://kafka-main1002.eqiad.wmnet:9093/bootstrap]: ssl://kafka-main1002.eqiad.wmnet:9093/1002: Connect to ipv6#[2620:0:861:102:10:64:16:37]:9093 failed: Connection refused (after 236ms in state CONNECT) Apr 05 08:06:17 cp5032 purged[2085976]: %3|1680681977.372|FAIL|purged#consumer-1| [thrd:ssl://kafka-main1002.eqiad.wmnet:9093/bootstrap]: ssl://kafka-main1002.eqiad.wmnet:9093/1002: Connect to ipv4#10.64.16.37:9093 failed: Connection refused (after 242ms in state CONNECT) Apr 05 08:06:17 cp5032 purged[2085976]: %3|1680681977.618|FAIL|purged#consumer-1| [thrd:ssl://kafka-main1002.eqiad.wmnet:9093/bootstrap]: ssl://kafka-main1002.eqiad.wmnet:9093/1002: Connect to ipv6#[2620:0:861:102:10:64:16:37]:9093 failed: Connection refused (after 234ms in state CONNECT) Apr 05 08:06:17 cp5032 purged[2085976]: %3|1680681977.995|FAIL|purged#consumer-1| [thrd:ssl://kafka-main1002.eqiad.wmnet:9093/bootstrap]: ssl://kafka-main1002.eqiad.wmnet:9093/1002: Connect to ipv4#10.64.16.37:9093 failed: Connection refused (after 242ms in state CONNECT) Apr 05 08:06:19 cp5032 purged[2085976]: %3|1680681979.048|FAIL|purged#consumer-1| [thrd:ssl://kafka-main1002.eqiad.wmnet:9093/bootstrap]: ssl://kafka-main1002.eqiad.wmnet:9093/1002: Connect to ipv6#[2620:0:861:102:10:64:16:37]:9093 failed: Connection refused (after 229ms in state CONNECT) Apr 05 09:35:11 cp5032 purged[2085976]: %6|1680687311.705|FAIL|purged#consumer-1| [thrd:ssl://kafka-main1003.eqiad.wmnet:9093/bootstrap]: ssl://kafka-main1003.eqiad.wmnet:9093/1003: Disconnected (after 72679977ms in state UP) Apr 05 09:35:11 cp5032 purged[2085976]: %6|1680687311.768|FAIL|purged#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: kafka-main1003.eqiad.wmnet:9093: Disconnected (after 5327389ms in state UP) Apr 05 09:35:11 cp5032 purged[2085976]: %3|1680687311.934|FAIL|purged#consumer-1| [thrd:ssl://kafka-main1003.eqiad.wmnet:9093/bootstrap]: ssl://kafka-main1003.eqiad.wmnet:9093/1003: Connect to ipv4#10.64.32.90:9093 failed: Connection refused (after 229ms in state CONNECT) Apr 05 09:35:11 cp5032 purged[2085976]: %3|1680687311.997|FAIL|purged#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: kafka-main1003.eqiad.wmnet:9093: Connect to ipv4#10.64.32.90:9093 failed: Connection refused (after 229ms in state CONNECT) Apr 05 09:35:12 cp5032 purged[2085976]: %3|1680687312.193|FAIL|purged#consumer-1| [thrd:ssl://kafka-main1003.eqiad.wmnet:9093/bootstrap]: ssl://kafka-main1003.eqiad.wmnet:9093/1003: Connect to ipv6#[2620:0:861:103:10:64:32:90]:9093 failed: Connection refused (after 258ms in state CONNECT) Apr 05 09:35:12 cp5032 purged[2085976]: %3|1680687312.225|FAIL|purged#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: kafka-main1003.eqiad.wmnet:9093: Connect to ipv6#[2620:0:861:103:10:64:32:90]:9093 failed: Connection refused (after 228ms in state CONNECT) Apr 05 09:35:12 cp5032 purged[2085976]: %3|1680687312.471|FAIL|purged#consumer-1| [thrd:ssl://kafka-main1003.eqiad.wmnet:9093/bootstrap]: ssl://kafka-main1003.eqiad.wmnet:9093/1003: Connect to ipv4#10.64.32.90:9093 failed: Connection refused (after 242ms in state CONNECT) Apr 05 09:35:12 cp5032 purged[2085976]: %3|1680687312.475|FAIL|purged#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: kafka-main1003.eqiad.wmnet:9093: Connect to ipv4#10.64.32.90:9093 failed: Connection refused (after 250ms in state CONNECT) Apr 05 09:35:12 cp5032 purged[2085976]: %3|1680687312.913|FAIL|purged#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: kafka-main1001.eqiad.wmnet:9093: Connect to ipv6#[2620:0:861:103:10:64:32:90]:9093 failed: Connection refused (after 250ms in state CONNECT) Apr 05 09:35:12 cp5032 purged[2085976]: %3|1680687312.939|FAIL|purged#consumer-1| [thrd:ssl://kafka-main1003.eqiad.wmnet:9093/bootstrap]: ssl://kafka-main1003.eqiad.wmnet:9093/1003: Connect to ipv6#[2620:0:861:103:10:64:32:90]:9093 failed: Connection refused (after 237ms in state CONNECT) Apr 05 09:35:19 cp5032 purged[2085976]: %4|1680687319.299|SESSTMOUT|purged#consumer-1| [thrd:main]: Consumer group session timed out (in join-state steady) after 10070 ms without a successful response from the group coordinator (broker 1001, last error was Broker: Not coordinator): revoking assignment and rejoining group Apr 05 09:40:31 cp5032 purged[2085976]: %4|1680687631.774|SESSTMOUT|purged#consumer-1| [thrd:main]: Consumer group session timed out (in join-state steady) after 10205 ms without a successful response from the group coordinator (broker 1003, last error was Local: Broker transport failure): revoking assignment and rejoining group
After a restart it all worked fine, but I am wondering if purged code should be more resilient to consumer group changes (or maybe if there is another problem).