Page MenuHomePhabricator

Search Update Pipeline: HTTP client/proxy config
Closed, ResolvedPublic2 Estimated Story Points

Description

Looking at the request durations, fetching responses for wikidata tends to be slow (> 5s for the 50% quartile). To avoid unnecessary retries, we could increase the timeout to 7s to allow slow responses, especially, since wikidata is responsible for 60% of the non-rerender fetches.

Retry logic is spread across multiple levels: envoy, http-client and flink's async operator. For the http-client it has been disabled explicitly, however, envoy still retries on 5xx upstream responses. Since envoy's retries are not transparent to the application, we might run into timeouts and loose the actual cause (5xx error). By passing a header x-envoy-max-retries: 0 (see docs) envoy won't retry automatically.

AC:

  • client: fetch timeout is 7s
  • client: pass x-envoy-max-retries: 0 header
  • envoy retries / retries attempts rate is zero

Event Timeline

pfischer renamed this task from Search Update Pipeline: Increase http request timeout to reduce retries to Search Update Pipeline: HTTP client/proxy config.Jan 8 2024, 2:52 PM
pfischer updated the task description. (Show Details)

Therefore, we should use mw-api-int-async (w/o retries) instead of mw-api-int-async-ro

mw-api-int-async accepting write requests might only be pooled in the primary DC (codfw at the moment) forcing all requests to a single DC and thus crossing DC boundaries when the flink job is not running in the primary DC.

Yes, you are right, Janis just told me. Alternatively, we can send a header x-envoy-max-retries: 0, see docs.

Gehel set the point value for this task to 2.

Change 989563 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[operations/deployment-charts@master] Search update pipeline: bump version

https://gerrit.wikimedia.org/r/989563

Change 989563 merged by jenkins-bot:

[operations/deployment-charts@master] Search update pipeline: bump version

https://gerrit.wikimedia.org/r/989563

Change 989739 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[operations/deployment-charts@master] Search update pipeline: bump version

https://gerrit.wikimedia.org/r/989739

Change 989739 merged by jenkins-bot:

[operations/deployment-charts@master] Search update pipeline: bump version

https://gerrit.wikimedia.org/r/989739