Page MenuHomePhabricator

Evaluate HAProxy 3.1
Closed, ResolvedPublic

Description

As mentioned by wtarreau on https://github.com/haproxy/haproxy/issues/2869#issuecomment-2668168400

In case you'd be struggling with making multiple services coexist under stress on the same machine, you could be interested in starting to evaluate 3.1, as it does bring some significant memory savings under contention (it avoids allocating buffers when the other side is congested) so for example a POST sent to a backend server would just no longer needlessly fill buffers when the server slows down. Also I'm seeing tune.h2.initial-window in your config so I think you have to deal with POSTs that made you adjust it, and 3.1 is much faster for POSTs as it now supports a dynamic window sizing (thus the paramter can be commented out).

HAProxy 3.1 is available on haproxy.debian.net for bullseye.

Current deployment status:

  • cp5024 (text)
  • cp5032 (upload)

Related Objects

StatusSubtypeAssignedTask
ResolvedVgutierrez

Event Timeline

Vgutierrez triaged this task as Medium priority.

Change #1120926 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] aptrepo,haproxy: Allow installing HAProxy 1.3 on bullseye

https://gerrit.wikimedia.org/r/1120926

Change #1120926 merged by Vgutierrez:

[operations/puppet@production] aptrepo,haproxy: Allow installing HAProxy 1.3 on bullseye

https://gerrit.wikimedia.org/r/1120926

Mentioned in SAL (#wikimedia-operations) [2025-02-20T08:42:57Z] <vgutierrez> uploaded haproxy 3.1.3 to thirdparty/haproxy31 - T386796

current production config is valid for HAProxy 3.1 (tested against 3.1.5), we could drop tune.h2.initial-window to benefit from dynamic window sizing introduced in 3.1

Change #1125393 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] haproxy: Don't set h2 initial-window-size on haproxy 3.1

https://gerrit.wikimedia.org/r/1125393

Change #1125393 merged by Vgutierrez:

[operations/puppet@production] haproxy: Don't set h2 initial-window-size on haproxy 3.1

https://gerrit.wikimedia.org/r/1125393

Change #1128384 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Test HAProxy 3.1 in cp5032 (upload)

https://gerrit.wikimedia.org/r/1128384

Change #1128384 merged by Vgutierrez:

[operations/puppet@production] hiera: Test HAProxy 3.1 in cp5032 (upload)

https://gerrit.wikimedia.org/r/1128384

Mentioned in SAL (#wikimedia-operations) [2025-03-17T13:25:33Z] <vgutierrez> upgrading HAProxy to version 3.1 in cp5032 (upload) - T386796

Mentioned in SAL (#wikimedia-operations) [2025-03-17T13:31:34Z] <vgutierrez> uploaded HAProxy 3.1.5 to apt.wm.o (bullseye-wikimedia) component thirdparty/haproxy31 - T386796

Change #1128428 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Upgrade to HAProxy 3.1 on cp5024 (text)

https://gerrit.wikimedia.org/r/1128428

Change #1128428 merged by Vgutierrez:

[operations/puppet@production] hiera: Upgrade to HAProxy 3.1 on cp5024 (text)

https://gerrit.wikimedia.org/r/1128428

Mentioned in SAL (#wikimedia-operations) [2025-03-17T14:29:42Z] <vgutierrez> upgrading HAProxy to version 3.1 in cp5024 (text) - T386796

Vgutierrez changed the task status from Open to In Progress.Mar 17 2025, 2:38 PM
Vgutierrez updated the task description. (Show Details)

Change #1128469 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] Revert "hiera: Upgrade to HAProxy 3.1 on cp5024 (text)"

https://gerrit.wikimedia.org/r/1128469

Change #1128470 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] Revert "hiera: Test HAProxy 3.1 in cp5032 (upload)"

https://gerrit.wikimedia.org/r/1128470

Change #1128469 merged by Vgutierrez:

[operations/puppet@production] Revert "hiera: Upgrade to HAProxy 3.1 on cp5024 (text)"

https://gerrit.wikimedia.org/r/1128469

Mentioned in SAL (#wikimedia-operations) [2025-03-17T16:45:12Z] <vgutierrez> downgrading HAProxy to version 2.8 in cp5024 (text) - T386796

Change #1128470 merged by Vgutierrez:

[operations/puppet@production] Revert "hiera: Test HAProxy 3.1 in cp5032 (upload)"

https://gerrit.wikimedia.org/r/1128470

Mentioned in SAL (#wikimedia-operations) [2025-03-17T16:46:12Z] <vgutierrez> downgrading HAProxy to version 2.8 in cp5032 (upload) - T386796

Vgutierrez changed the task status from In Progress to Stalled.Mar 17 2025, 4:48 PM
Vgutierrez updated the task description. (Show Details)

downgraded after seeing the following issue in cp5024:

Mar 17 16:00:08 cp5024 systemd[1]: Reloaded HAProxy Load Balancer.
Mar 17 16:00:08 cp5024 haproxy[3828921]: [WARNING]  (3828921) : Proxy tls stopped (cumulated conns: FE: 5807551, BE: 15422820).
Mar 17 16:00:08 cp5024 haproxy[3828921]: [WARNING]  (3828921) : Proxy stats stopped (cumulated conns: FE: 84, BE: 0).
Mar 17 16:00:08 cp5024 haproxy[3828921]: [WARNING]  (3828921) : Proxy http stopped (cumulated conns: FE: 1605457, BE: 0).
Mar 17 16:00:08 cp5024 haproxy[3828921]: [WARNING]  (3828921) : Proxy httpreqrate stopped (cumulated conns: FE: 0, BE: 0).
Mar 17 16:00:08 cp5024 haproxy[3828921]: [WARNING]  (3828921) : Proxy httpreqrate_http stopped (cumulated conns: FE: 0, BE: 0).
Mar 17 16:00:08 cp5024 haproxy[3828921]: [WARNING]  (3828921) : Proxy healthcheck stopped (cumulated conns: FE: 0, BE: 2016).
Mar 17 16:00:08 cp5024 haproxy[3828921]: WARNING! thread 1 has stopped processing traffic for 430 milliseconds
Mar 17 16:00:08 cp5024 haproxy[3828921]:     with 7 streams currently blocked, prevented from making any progress.
Mar 17 16:00:08 cp5024 haproxy[3828921]:     While this may occasionally happen with inefficient configurations
Mar 17 16:00:08 cp5024 haproxy[3828921]:     involving excess of regular expressions, map_reg, or heavy Lua processing,
Mar 17 16:00:08 cp5024 haproxy[3828921]:     this must remain exceptional because the system's stability is now at risk.
Mar 17 16:00:08 cp5024 haproxy[3828921]:     Timers in logs may be reported incorrectly, spurious timeouts may happen,
Mar 17 16:00:08 cp5024 haproxy[3828921]:     some incoming connections may silently be dropped, health checks may
Mar 17 16:00:08 cp5024 haproxy[3828921]:     randomly fail, and accesses to the CLI may block the whole process. The
Mar 17 16:00:08 cp5024 haproxy[3828921]:     blocking delay before emitting this warning may be adjusted via the global
Mar 17 16:00:08 cp5024 haproxy[3828921]:     'warn-blocked-traffic-after' directive. Please check the trace below for
Mar 17 16:00:08 cp5024 haproxy[3828921]:     any clues about configuration elements that need to be corrected:
Mar 17 16:00:08 cp5024 haproxy[3828921]: * Thread 1 : id=0x7f835393ec00 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=1 rqsz=1
Mar 17 16:00:08 cp5024 haproxy[3828921]:       1/1    stuck=0 prof=0 harmless=0 isolated=1
Mar 17 16:00:08 cp5024 haproxy[3828921]:              cpu_ns: poll=136029506112 now=136459683869 diff=430177757
Mar 17 16:00:08 cp5024 haproxy[3828921]:              curr_task=0x7f8352192bc0 (task) calls=2 last=0
Mar 17 16:00:08 cp5024 haproxy[3828921]:                fct=0x55cc05af93e0(manage_proxy) ctx=0x7f8353469000
Mar 17 16:00:08 cp5024 haproxy[3828921]:              call trace(25):
Mar 17 16:00:08 cp5024 haproxy[3828921]:              | 0x55cc05b14f19 [eb df 48 85 c0 74 ac e8]: ha_thread_dump_fill+0x99/0xc8
Mar 17 16:00:08 cp5024 haproxy[3828921]:              | 0x55cc05b15526 [48 85 c0 74 4a 48 8b 54]: ha_stuck_warning+0xe6/0x232
Mar 17 16:00:08 cp5024 haproxy[3828921]:              | 0x55cc05bf6f1c [eb df 66 90 48 83 ec 38]: wdt_handler+0x23c/0x23e
Mar 17 16:00:08 cp5024 haproxy[3828921]:              | 0x7f8353e57140 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x13140
Mar 17 16:00:08 cp5024 haproxy[3828921]:              | 0x7f8353d65307 [48 3d 01 f0 ff ff 73 01]: libc:madvise+0x7/0x21
Mar 17 16:00:08 cp5024 haproxy[3828921]:              | 0x7f8353ecc4ae [85 c0 0f 95 c0 48 83 c4]: libjemalloc:+0x664ae
Mar 17 16:00:08 cp5024 haproxy[3828921]:              | 0x7f8353ec3a21 [49 8b 7d 00 84 c0 0f 85]: libjemalloc:+0x5da21
Mar 17 16:00:08 cp5024 haproxy[3828921]:              | 0x7f8353e87601 [4c 01 64 24 18 4d 85 f6]: libjemalloc:+0x21601
Mar 17 16:00:08 cp5024 haproxy[3828921]:              | 0x7f8353e87db4 [4c 89 ef 41 c6 44 24 68]: libjemalloc:+0x21db4
Mar 17 16:00:08 cp5024 haproxy[3828921]:              | 0x7f8353e8a210 [84 c0 74 0c 48 83 c4 08]: libjemalloc:+0x24210
Mar 17 16:00:08 cp5024 haproxy[3828921]:              | 0x7f8353eb45f4 [eb a3 66 2e 0f 1f 84 00]: libjemalloc:+0x4e5f4
Mar 17 16:00:08 cp5024 haproxy[3828921]:              | 0x7f8353eb4673 [31 c0 48 83 c4 08 c3 66]: libjemalloc:+0x4e673
Mar 17 16:00:08 cp5024 haproxy[3828921]:              | 0x7f8353ebcdc7 [5a 59 48 8b 9c 24 98 00]: libjemalloc:+0x56dc7
Mar 17 16:00:08 cp5024 haproxy[3828921]:              | 0x55cc05b7d42a [3b 5c 24 04 72 b8 b8 01]: main+0x24b9ba
Mar 17 16:00:08 cp5024 haproxy[3828921]:              | 0x55cc05b7df88 [8b 5c 24 1c 39 5c 24 18]: pool_gc+0x188/0x1f4
Mar 17 16:00:08 cp5024 haproxy[3828921]:              | 0x55cc05af9797 [89 d8 d1 f8 44 39 e8 0f]: manage_proxy+0x3b7/0x478
Mar 17 16:00:08 cp5024 haproxy[3828921]: ### Note: one thread was found stuck under malloc_trim(), which can run for a
Mar 17 16:00:08 cp5024 haproxy[3828921]:           very long time on large memory systems. You way want to disable this
Mar 17 16:00:08 cp5024 haproxy[3828921]:           memory reclaiming feature by setting 'no-memory-trimming' in the
Mar 17 16:00:08 cp5024 haproxy[3828921]:           'global' section of your configuration to avoid this in the future.
Mar 17 16:00:08 cp5024 haproxy[3828921]:  => Trying to gracefully recover now.
Vgutierrez claimed this task.

We should proceed with the next LTS branch: 3.2