Page MenuHomePhabricator

Upgrade haproxy to 2.8.13 on cp hosts
Closed, ResolvedPublic

Description

From the release notes we got some interesting fixes:

A weird issue was fixed about the epoll poller. Over the last two years,
there were few reports about immediate closes spuriously happening on
connections where network captures proved that the server had not closed at
all (and sometimes even received the request and responded to it after
HAProxy had closed). The logs shown that a successful connection was
immediately reported on error after the request was sent. After
investigations, it appeared that a EPOLLUP, or eventually a EPOLLRDHUP, can
be reported by epool_wait() during the connect() but in sock_conn_check(),
the connect() reports a success. So the connection was validated but the HUP
was handled on the first receive and an error was reported. So, to
workaround the issue, we have decided to remove FD_POLL_HUP flag on the FD
during the connection establishment if FD_POLL_ERR is not reported too in
sock_conn_check(). This way, the call to connect() is able to validate or
reject the connection. At the end, if the HUP or RDHUP flags were valid,
either connect() would report the error itself, or the next recv() would
return 0 confirming the closure that the poller tried to report. EPOLL_RDHUP
is only an optimization to save a syscall anyway, and this pattern is so
rare that nobody will ever notice the extra call to recv(). Please note that
at least one reporter confirmed that using poll() instead of epoll() also
addressed the problem, so that can also be a temporary workaround for those
discovering the problem without the ability to immediately upgrade.

Also this one could impact haproxykafka /cc @Fabfur:

In logs, the server response time (%Tr) was erroneously reported as -1 when
it was intercepted by HAProxy. -1 is reserved to the case where response
headers were not fully received.

haproxy 2.8.13 upgrade status:

  • eqiad
  • codfw
  • esams
  • ulsfo
  • eqsin
  • drmrs
  • magru

Event Timeline

Vgutierrez triaged this task as Medium priority.Jan 7 2025, 8:54 AM
Vgutierrez moved this task from Backlog to Actively Servicing on the Traffic board.

Mentioned in SAL (#wikimedia-operations) [2025-01-09T09:02:16Z] <vgutierrez> update to haproxy 2.8.13 on component thirdparty/haproxy28 bullseye-wikimedia (apt.wm.o) - T383111

Vgutierrez claimed this task.
Vgutierrez updated the task description. (Show Details)