[API Gateway] Get insight into proxy time for Envoy
Open, Needs TriagePublic
Actions

Description

We need to get additional insight into Envoy's proxy time - to be specific, the sum of times

between a request arriving and being sent to the upstream
between a response being received from the upstream and the full response being sent to the downstream

There are additional complications to be considered in terms of how filters and things like rate limiting service impacts these times, but for the purposes of our exercise we don't necessarily care about the nuances of these as long as we get raw values for the above.

These statistics should be available via Prometheus as a histogram (our ultimate goal being calculating the 99th percentile of proxy times).

I have spent a long time looking at existing metrics and trying to ascertain whether we can get this from existing metrics but it seems like this functionality does not exist, and that any results we'd approximate from existing histograms (for example trying to juggle the existing upstream and downstream time histograms) would be inaccurate.

Currently our response time SLI is the time from request to response of API content to the user. This means that our response time is bound by the response time of the appservers, databases and other components along the path. This is correct from the perspective of the API server being an application rather than a proxy. Our response time SLIs would be a lot lower and more useful by being independent of these factors and encompassing the real amount of time taken to serve requests.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		hnowlan	T294445 API Gateway has missed its write latency SLO
		Open		hnowlan	T297222 [API Gateway] Get insight into proxy time for Envoy

Event Timeline

hnowlan created this task.Dec 7 2021, 5:32 PM

Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptDec 7 2021, 5:32 PM

Filed an issue with the Envoy project here https://github.com/envoyproxy/envoy/issues/19268

VirginiaPoundstone moved this task from Backlog to To be discussed on the Platform Team Initiatives (API Gateway) board.Mar 15 2023, 10:52 PM

Aklapper renamed this task from Get insight into proxy time for Envoy to [API Gateway] Get insight into proxy time for Envoy .Apr 1 2024, 8:09 AM

As API Gateway is nowadays owned by serviceops, adding the serviceops project tag to open API Gateway tasks tagged with the deprecated/archived "Platform Team Initiatives (API Gateway)" tag at https://phabricator.wikimedia.org/project/profile/4321/, as part of Phabricator Housekeeping.

hnowlan updated the task description. (Show Details)Tue, Apr 2, 9:28 AM

hnowlan mentioned this in T277584: [API Gateway] Redefine response time using proxy model.

[API Gateway] Get insight into proxy time for Envoy Open, Needs TriagePublicActions

Description

Related ObjectsSearch...

Event Timeline

[API Gateway] Get insight into proxy time for Envoy
Open, Needs TriagePublic
Actions

Related Objects
Search...