Apr 10 22:52:19 cp2033 haproxy[1401852]: [ALERT] (1401852) : A bogus STREAM [0x7fc41c262cb0] is spinning at 191831 calls per second and refuses to die, aborting now! Please report this error to developers [strm=0x7fc41c262cb0,14284a src=REDACTED (IPv4) fe=tls be=tls dst=backend_server_1 txn=0x7fc41c5d6d40,40000 txn.req=MSG_ERROR,d txn.rsp=MSG_DONE,4d rqf=84a064 rqa=0 rpf=c0048000 rpa=0 scf=0x7fc41c0bbcb0,EST,0 scb=0x7fc41c4d98c0,EST,1 af=(nil),0 sab=(nil),0 cof=0x7fc41816ec60,80040300:H2(0x7fc41c1a93e0)/SSL(0x7fc41c47a0d0)/tcpv4(3753) cob=0x7fc41493d2e0,300:H1(0x7fc4184eee20)/RAW((nil))/unix_stream(2310) filters={}] Apr 10 22:52:19 cp2033 haproxy[1401852]: call trace(10): Apr 10 22:52:19 cp2033 haproxy[1401852]: | 0x560047dd0d32 [c6 04 25 01 00 00 00 00]: main-0x28ce Apr 10 22:52:19 cp2033 haproxy[1401852]: | 0x560047e77b99 [e9 46 d8 ff ff 41 89 dc]: process_stream+0x2839/0x3424 Apr 10 22:52:19 cp2033 haproxy[1401852]: | 0x560047f539cd [48 89 c3 64 49 8b 06 48]: run_tasks_from_lists+0x39d/0x7ec Apr 10 22:52:19 cp2033 haproxy[1401852]: | 0x560047f54231 [29 44 24 14 8b 4c 24 14]: process_runnable_tasks+0x411/0x8eb Apr 10 22:52:19 cp2033 haproxy[1401852]: | 0x560047f245b9 [83 3d 40 b7 20 00 01 0f]: run_poll_loop+0x129/0x412 Apr 10 22:52:19 cp2033 haproxy[1401852]: | 0x560047f24a69 [48 8b 1d f0 36 15 00 4c]: main+0x151469 Apr 10 22:52:19 cp2033 haproxy[1401852]: | 0x7fc48aa5eea7 [64 48 89 04 25 30 06 00]: libpthread:+0x7ea7 Apr 10 22:52:19 cp2033 haproxy[1401852]: | 0x7fc48a46da2f [48 89 c7 b8 3c 00 00 00]: libc:clone+0x3f/0x5a Apr 10 22:52:19 cp2033 haproxy[3941698]: [ALERT] (3941698) : Current worker (1401852) exited with code 139 (Segmentation fault)
Description
Details
Related Objects
- Mentioned Here
- T332796: HAProxy 2.6.10 crashing in the text cluster
Event Timeline
Mentioned in SAL (#wikimedia-operations) [2023-04-11T07:54:43Z] <vgutierrez> restart haproxy on cp2033 - T334448
Now also observed on cp2035:
Apr 11 22:00:08 cp2035 haproxy[2532735]: [ALERT] (2532735) : A bogus STREAM [0x7f1a2834d450] is spinning at 193580 calls per second and refuses to die, aborting now! Please report this error to developers [strm=0x7f1a2834d450,14284a src=<REDACTED> fe=tls be=tls dst=backend_server_0 txn=0x7f1a28647de0,40000 txn.req=MSG_ERROR,d txn.rsp=MSG_DONE,4d rqf=84a064 rqa=0 rpf=c0048000 rpa=0 scf=0x7f1a280c14f0,EST,0 scb=0x7f1a2861bce0,EST,1 af=(nil),0 sab=(nil),0 cof=0x7f1a104> Apr 11 22:00:08 cp2035 haproxy[2532735]: call trace(10): Apr 11 22:00:08 cp2035 haproxy[2532735]: | 0x55662d03ed32 [c6 04 25 01 00 00 00 00]: main-0x28ce Apr 11 22:00:08 cp2035 haproxy[2532735]: | 0x55662d0e5b99 [e9 46 d8 ff ff 41 89 dc]: process_stream+0x2839/0x3424 Apr 11 22:00:08 cp2035 haproxy[2532735]: | 0x55662d1c19cd [48 89 c3 64 49 8b 06 48]: run_tasks_from_lists+0x39d/0x7ec Apr 11 22:00:08 cp2035 haproxy[2532735]: | 0x55662d1c2231 [29 44 24 14 8b 4c 24 14]: process_runnable_tasks+0x411/0x8eb Apr 11 22:00:08 cp2035 haproxy[2532735]: | 0x55662d1925b9 [83 3d 40 b7 20 00 01 0f]: run_poll_loop+0x129/0x412 Apr 11 22:00:08 cp2035 haproxy[2532735]: | 0x55662d192a69 [48 8b 1d f0 36 15 00 4c]: main+0x151469 Apr 11 22:00:08 cp2035 haproxy[2532735]: | 0x7f1a86f0cea7 [64 48 89 04 25 30 06 00]: libpthread:+0x7ea7 Apr 11 22:00:08 cp2035 haproxy[2532735]: | 0x7f1a8691ba2f [48 89 c7 b8 3c 00 00 00]: libc:clone+0x3f/0x5a Apr 11 22:00:09 cp2035 haproxy[230806]: [ALERT] (230806) : Current worker (2532735) exited with code 139 (Segmentation fault) Apr 11 22:00:09 cp2035 haproxy[230806]: [ALERT] (230806) : exit-on-failure: killing every processes with SIGTERM Apr 11 22:00:09 cp2035 haproxy[230806]: [WARNING] (230806) : All workers exited. Exiting... (139)
Mentioned in SAL (#wikimedia-operations) [2023-04-12T06:38:04Z] <vgutierrez> restart haproxy on cp2035 - T334448
Mentioned in SAL (#wikimedia-sre) [2023-04-13T13:23:51Z] <vgutierrez> restarting haproxy in cp5022 - T334448
Change 908546 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):
[operations/puppet@production] cache::haproxy: Enable coredump configuration
Change 908547 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):
[operations/puppet@production] hiera: Enable coredumps for haproxy at text cache cluster
Change 908546 merged by Vgutierrez:
[operations/puppet@production] cache::haproxy: Enable coredump configuration
Change 908547 merged by Vgutierrez:
[operations/puppet@production] hiera: Enable coredumps for haproxy at text cache cluster
Mentioned in SAL (#wikimedia-operations) [2023-04-13T14:04:53Z] <vgutierrez> rolling restart of HAProxy on A:cp-text - T334448
Change 908557 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):
[operations/puppet@production] cache::haproxy: Drop LimitCORESoft
Change 908557 merged by Vgutierrez:
[operations/puppet@production] cache::haproxy: Drop LimitCORESoft
uptream has decided to rollback the commit triggering the underlying issue: http://git.haproxy.org/?p=haproxy-2.6.git;a=commit;h=d66823ece6e40cf27dca767591097f13d9aac57b
Change 908934 had a related patch set uploaded (by Ssingh; author: Ssingh):
[operations/puppet@production] cache::haproxy: enable systemd-coredump
Mentioned in SAL (#wikimedia-operations) [2023-04-16T07:54:21Z] <vgutierrez> restart haproxy on cp2033 to clear unexpected service restart alerts - T334448
Mentioned in SAL (#wikimedia-operations) [2023-04-17T07:49:33Z] <vgutierrez> restart haproxy on cp3054 - T334448
Change 909209 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):
[operations/puppet@production] cache::haproxy: Relax hardening when coredumps are enabled
Change 908934 abandoned by Ssingh:
[operations/puppet@production] cache::haproxy: enable systemd-coredump
Reason:
not required, core_pattern is already present in base
Change 909209 merged by Vgutierrez:
[operations/puppet@production] cache::haproxy: Relax hardening when coredumps are enabled
Change 909287 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):
[operations/puppet@production] cache::haproxy: use set-dumpable if coredumps are enabled
Change 909287 merged by Vgutierrez:
[operations/puppet@production] cache::haproxy: Add set-dumpable to haproxy global options
Mentioned in SAL (#wikimedia-operations) [2023-04-17T15:07:19Z] <vgutierrez> rolling restart of HAProxy in the text cluster - T334448
we had another occurrence in cp1087 this time, reported to upstream with a full backtrace on https://github.com/haproxy/haproxy/issues/2111#issuecomment-1518506256
Mentioned in SAL (#wikimedia-operations) [2023-04-22T04:33:45Z] <vgutierrez> restart haproxy on cp1087 - T334448
Mentioned in SAL (#wikimedia-operations) [2023-04-24T15:08:41Z] <vgutierrez> restarting haproxy on cp3064 - T334448
Mentioned in SAL (#wikimedia-operations) [2023-05-01T14:58:20Z] <sukhe> restart haproxy on cp1077: T334448
Mentioned in SAL (#wikimedia-operations) [2023-05-02T08:44:17Z] <vgutierrez> testing haproxy 2.6.12-1~bpo10+1+wmf1 in cp1077 and cp1085 - T334448
Decreasing the priority as we are already testing a fixed version (fix proposed by upstream and that should be released as part of HAProxy 2.6.13 at some point) and we aren't seeing a big amount of segfaults.. (2 per week cluster wide lately)
Mentioned in SAL (#wikimedia-operations) [2023-05-03T00:47:08Z] <sukhe> restart haproxy on cp2031: T334448
Mentioned in SAL (#wikimedia-operations) [2023-05-07T00:54:27Z] <sukhe> restart haproxy on cp1087: T334448
Mentioned in SAL (#wikimedia-operations) [2023-05-08T07:53:27Z] <vgutierrez> fetch HAProxy 2.6.13 on thirdparty/haproxy2.6 (apt.wm.o) - T334448
Mentioned in SAL (#wikimedia-operations) [2023-05-08T08:27:20Z] <vgutierrez> HAProxy updated to 2.6.13 on cp1077 and cp1085 - T334448
Thanks to @BCornwall for taking care of the final deployment of HAProxy 2.6.13 cluster wide