Page MenuHomePhabricator

Set per-request timeout on ATS-BE
Closed, ResolvedPublic

Description

Currently ats-be doesn't enforce a timeout for backend servers, it will wait indefinitely till the other side (usually Envoy) chooses to give up waiting (after 203 seconds, more details on https://wikitech.wikimedia.org/wiki/HTTP_timeouts#App_server):

vgutierrez@cp3064:~$ sudo -i traffic_ctl config match active_timeout_out
proxy.config.http.transaction_active_timeout_out: 0

This is effectively triggering the default timeout as ATS 9 is reporting on the new proxy.process.net.default_inactivity_timeout_applied metric:

vgutierrez@cp3064:~$ sudo -i traffic_ctl config match default_inacti
proxy.config.net.default_inactivity_timeout: 360
vgutierrez@cp3064:~$ sudo -i traffic_ctl metric match default_inactivity
proxy.process.net.default_inactivity_timeout_applied 2438573
proxy.process.net.default_inactivity_timeout_count 6

Setting a proxy.config.http.transaction_active_timeout_out value aligned with the rest of timeouts in our stack should help under heavy load scenarios

Event Timeline

Vgutierrez triaged this task as Medium priority.Aug 18 2022, 9:05 AM
Vgutierrez moved this task from Backlog to Actively Servicing on the Traffic board.

Change 824484 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] trafficserver: Set transaction_active_timeout_out on cp4026 and cp4032

https://gerrit.wikimedia.org/r/824484

Change 824484 merged by Vgutierrez:

[operations/puppet@production] trafficserver: Set transaction_active_timeout_out on cp4026 and cp4032

https://gerrit.wikimedia.org/r/824484

Change 826228 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] trafficserver: Set transaction_active_timeout_out for ulsfo

https://gerrit.wikimedia.org/r/826228

Change 826228 merged by Vgutierrez:

[operations/puppet@production] trafficserver: Set transaction_active_timeout_out for ulsfo

https://gerrit.wikimedia.org/r/826228

Change 827952 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] trafficserver: Enforce per request timeout globally

https://gerrit.wikimedia.org/r/827952

Change 827952 merged by Vgutierrez:

[operations/puppet@production] trafficserver: Enforce per request timeout globally

https://gerrit.wikimedia.org/r/827952

Mentioned in SAL (#wikimedia-operations) [2022-08-30T08:24:42Z] <vgutierrez> ATS: enforce per-request timeout globally (205 secs) - T315533