Page MenuHomePhabricator

SUP: Retry 429 (rate limit) at HTTP client level
Closed, ResolvedPublic5 Estimated Story Points

Description

Currently, the SUP retries failed fetch requests to MW API with a linear backoff, that increases by 1s per retry. That might make sense in case the API is not available for some reason, but in case of a 429 response (> rate limit enforced by envoy), it would block unnecessarily long. Since envoy rate-limits requests based on a sliding window, we might retry more rapidly.

Since flink's API for async operator retries does not expose the retry cause for calculating the backoff, should implement this at HTTP client level.

AC:

  • Both, sync and async client use a retry strategy tailored to retry only 429 with a constant 200ms backoff

Event Timeline

Gehel triaged this task as High priority.Jun 17 2024, 3:24 PM
Gehel moved this task from needs triage to Current work on the Discovery-Search board.
Gehel edited projects, added Discovery-Search (Current work); removed Discovery-Search.
Gehel set the point value for this task to 5.Jun 17 2024, 3:47 PM
EBernhardson subscribed.

AFAICT this has been deployed. The currently deployed version contains both patches above, and our helmfile configuration sets the new http-rate-limit-per-second to 600.