Page MenuHomePhabricator

HAProxy 2.6.12 segfaults
Closed, ResolvedPublic

Description

Apr 10 22:52:19 cp2033 haproxy[1401852]: [ALERT]    (1401852) : A bogus STREAM [0x7fc41c262cb0] is spinning at 191831 calls per second and refuses to die, aborting now! Please report this error to developers [strm=0x7fc41c262cb0,14284a src=REDACTED (IPv4) fe=tls be=tls dst=backend_server_1 txn=0x7fc41c5d6d40,40000 txn.req=MSG_ERROR,d txn.rsp=MSG_DONE,4d rqf=84a064 rqa=0 rpf=c0048000 rpa=0 scf=0x7fc41c0bbcb0,EST,0 scb=0x7fc41c4d98c0,EST,1 af=(nil),0 sab=(nil),0 cof=0x7fc41816ec60,80040300:H2(0x7fc41c1a93e0)/SSL(0x7fc41c47a0d0)/tcpv4(3753) cob=0x7fc41493d2e0,300:H1(0x7fc4184eee20)/RAW((nil))/unix_stream(2310) filters={}]
Apr 10 22:52:19 cp2033 haproxy[1401852]:   call trace(10):
Apr 10 22:52:19 cp2033 haproxy[1401852]:   | 0x560047dd0d32 [c6 04 25 01 00 00 00 00]: main-0x28ce
Apr 10 22:52:19 cp2033 haproxy[1401852]:   | 0x560047e77b99 [e9 46 d8 ff ff 41 89 dc]: process_stream+0x2839/0x3424
Apr 10 22:52:19 cp2033 haproxy[1401852]:   | 0x560047f539cd [48 89 c3 64 49 8b 06 48]: run_tasks_from_lists+0x39d/0x7ec
Apr 10 22:52:19 cp2033 haproxy[1401852]:   | 0x560047f54231 [29 44 24 14 8b 4c 24 14]: process_runnable_tasks+0x411/0x8eb
Apr 10 22:52:19 cp2033 haproxy[1401852]:   | 0x560047f245b9 [83 3d 40 b7 20 00 01 0f]: run_poll_loop+0x129/0x412
Apr 10 22:52:19 cp2033 haproxy[1401852]:   | 0x560047f24a69 [48 8b 1d f0 36 15 00 4c]: main+0x151469
Apr 10 22:52:19 cp2033 haproxy[1401852]:   | 0x7fc48aa5eea7 [64 48 89 04 25 30 06 00]: libpthread:+0x7ea7
Apr 10 22:52:19 cp2033 haproxy[1401852]:   | 0x7fc48a46da2f [48 89 c7 b8 3c 00 00 00]: libc:clone+0x3f/0x5a
Apr 10 22:52:19 cp2033 haproxy[3941698]: [ALERT]    (3941698) : Current worker (1401852) exited with code 139 (Segmentation fault)

Event Timeline

Vgutierrez renamed this task from HAProxy 2.6.16 segfaults on cp2033 to HAProxy 2.6.12 segfaults on cp2033.Apr 11 2023, 7:52 AM
Vgutierrez triaged this task as High priority.

Mentioned in SAL (#wikimedia-operations) [2023-04-11T07:54:43Z] <vgutierrez> restart haproxy on cp2033 - T334448

This seems to be the same issue as T332796 due to an incomplete bugfix

Now also observed on cp2035:

Apr 11 22:00:08 cp2035 haproxy[2532735]: [ALERT]    (2532735) : A bogus STREAM [0x7f1a2834d450] is spinning at 193580 calls per second and refuses to die, aborting now! Please report this error to developers [strm=0x7f1a2834d450,14284a src=<REDACTED> fe=tls be=tls dst=backend_server_0 txn=0x7f1a28647de0,40000 txn.req=MSG_ERROR,d txn.rsp=MSG_DONE,4d rqf=84a064 rqa=0 rpf=c0048000 rpa=0 scf=0x7f1a280c14f0,EST,0 scb=0x7f1a2861bce0,EST,1 af=(nil),0 sab=(nil),0 cof=0x7f1a104>
Apr 11 22:00:08 cp2035 haproxy[2532735]:   call trace(10):
Apr 11 22:00:08 cp2035 haproxy[2532735]:   | 0x55662d03ed32 [c6 04 25 01 00 00 00 00]: main-0x28ce
Apr 11 22:00:08 cp2035 haproxy[2532735]:   | 0x55662d0e5b99 [e9 46 d8 ff ff 41 89 dc]: process_stream+0x2839/0x3424
Apr 11 22:00:08 cp2035 haproxy[2532735]:   | 0x55662d1c19cd [48 89 c3 64 49 8b 06 48]: run_tasks_from_lists+0x39d/0x7ec
Apr 11 22:00:08 cp2035 haproxy[2532735]:   | 0x55662d1c2231 [29 44 24 14 8b 4c 24 14]: process_runnable_tasks+0x411/0x8eb
Apr 11 22:00:08 cp2035 haproxy[2532735]:   | 0x55662d1925b9 [83 3d 40 b7 20 00 01 0f]: run_poll_loop+0x129/0x412
Apr 11 22:00:08 cp2035 haproxy[2532735]:   | 0x55662d192a69 [48 8b 1d f0 36 15 00 4c]: main+0x151469
Apr 11 22:00:08 cp2035 haproxy[2532735]:   | 0x7f1a86f0cea7 [64 48 89 04 25 30 06 00]: libpthread:+0x7ea7
Apr 11 22:00:08 cp2035 haproxy[2532735]:   | 0x7f1a8691ba2f [48 89 c7 b8 3c 00 00 00]: libc:clone+0x3f/0x5a
Apr 11 22:00:09 cp2035 haproxy[230806]: [ALERT]    (230806) : Current worker (2532735) exited with code 139 (Segmentation fault)
Apr 11 22:00:09 cp2035 haproxy[230806]: [ALERT]    (230806) : exit-on-failure: killing every processes with SIGTERM
Apr 11 22:00:09 cp2035 haproxy[230806]: [WARNING]  (230806) : All workers exited. Exiting... (139)

Mentioned in SAL (#wikimedia-operations) [2023-04-12T06:38:04Z] <vgutierrez> restart haproxy on cp2035 - T334448

Vgutierrez renamed this task from HAProxy 2.6.12 segfaults on cp2033 to HAProxy 2.6.12 segfaults.Apr 12 2023, 6:42 AM

Mentioned in SAL (#wikimedia-sre) [2023-04-13T13:23:51Z] <vgutierrez> restarting haproxy in cp5022 - T334448

Change 908546 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] cache::haproxy: Enable coredump configuration

https://gerrit.wikimedia.org/r/908546

Change 908547 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Enable coredumps for haproxy at text cache cluster

https://gerrit.wikimedia.org/r/908547

Change 908546 merged by Vgutierrez:

[operations/puppet@production] cache::haproxy: Enable coredump configuration

https://gerrit.wikimedia.org/r/908546

Change 908547 merged by Vgutierrez:

[operations/puppet@production] hiera: Enable coredumps for haproxy at text cache cluster

https://gerrit.wikimedia.org/r/908547

Mentioned in SAL (#wikimedia-operations) [2023-04-13T14:04:53Z] <vgutierrez> rolling restart of HAProxy on A:cp-text - T334448

Change 908557 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] cache::haproxy: Drop LimitCORESoft

https://gerrit.wikimedia.org/r/908557

Change 908557 merged by Vgutierrez:

[operations/puppet@production] cache::haproxy: Drop LimitCORESoft

https://gerrit.wikimedia.org/r/908557

Change 908934 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] cache::haproxy: enable systemd-coredump

https://gerrit.wikimedia.org/r/908934

Mentioned in SAL (#wikimedia-operations) [2023-04-16T07:54:21Z] <vgutierrez> restart haproxy on cp2033 to clear unexpected service restart alerts - T334448

Mentioned in SAL (#wikimedia-operations) [2023-04-17T07:49:33Z] <vgutierrez> restart haproxy on cp3054 - T334448

Change 909209 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] cache::haproxy: Relax hardening when coredumps are enabled

https://gerrit.wikimedia.org/r/909209

Change 908934 abandoned by Ssingh:

[operations/puppet@production] cache::haproxy: enable systemd-coredump

Reason:

not required, core_pattern is already present in base

https://gerrit.wikimedia.org/r/908934

Change 909209 merged by Vgutierrez:

[operations/puppet@production] cache::haproxy: Relax hardening when coredumps are enabled

https://gerrit.wikimedia.org/r/909209

Change 909287 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] cache::haproxy: use set-dumpable if coredumps are enabled

https://gerrit.wikimedia.org/r/909287

Change 909287 merged by Vgutierrez:

[operations/puppet@production] cache::haproxy: Add set-dumpable to haproxy global options

https://gerrit.wikimedia.org/r/909287

Mentioned in SAL (#wikimedia-operations) [2023-04-17T15:07:19Z] <vgutierrez> rolling restart of HAProxy in the text cluster - T334448

we had another occurrence in cp1087 this time, reported to upstream with a full backtrace on https://github.com/haproxy/haproxy/issues/2111#issuecomment-1518506256

Mentioned in SAL (#wikimedia-operations) [2023-04-22T04:33:45Z] <vgutierrez> restart haproxy on cp1087 - T334448

Mentioned in SAL (#wikimedia-operations) [2023-04-24T15:08:41Z] <vgutierrez> restarting haproxy on cp3064 - T334448

Mentioned in SAL (#wikimedia-operations) [2023-05-02T08:44:17Z] <vgutierrez> testing haproxy 2.6.12-1~bpo10+1+wmf1 in cp1077 and cp1085 - T334448

Vgutierrez lowered the priority of this task from High to Medium.May 2 2023, 10:26 AM

Decreasing the priority as we are already testing a fixed version (fix proposed by upstream and that should be released as part of HAProxy 2.6.13 at some point) and we aren't seeing a big amount of segfaults.. (2 per week cluster wide lately)

Mentioned in SAL (#wikimedia-operations) [2023-05-08T07:53:27Z] <vgutierrez> fetch HAProxy 2.6.13 on thirdparty/haproxy2.6 (apt.wm.o) - T334448

Mentioned in SAL (#wikimedia-operations) [2023-05-08T08:27:20Z] <vgutierrez> HAProxy updated to 2.6.13 on cp1077 and cp1085 - T334448

Vgutierrez added a subscriber: BCornwall.

Thanks to @BCornwall for taking care of the final deployment of HAProxy 2.6.13 cluster wide