[Search Update Pipeline] Fetch: Handle Timeout of AsyncAwaitOperator
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	pfischer
	Sep 28 2023, 6:31 AM

Description

Currently, we have timeouts at multiple levels: HTTP requests (socket write + read) and the AsyncWaitOperator wrapping those HTTP requests:

TransformOperator
- AsyncWaitOperator
  - RetryingAsyncFunction
    - BypassingCirrusDocFetcher […] HttpClient
    - LagAwareRetryPredicate

Since the LagAwareRetryPredicate only caps the number of retries for late events, it will retry indefinitely until RetryingAsyncFunction times out. Timing out results in a TimeoutException which, since it does not get handled, crashes the application.

fetch_error schema: remove restriction on error_type (it only destroys information)
RetryingAsyncFunction class: override/implement org.wikimedia.discovery.cirrus.updater.common.graph.RetryingAsyncFunction#timeout so it completes with a FetchResult.fromError
ConsumerApplicationIT verifies output routed to fetch_error stream/topic

Related Objects
Search...

Status	Assigned	Task
Open	None	T317045 [Epic] Re-architect the Search Update Pipeline
Resolved	pfischer	T347545 [Search Update Pipeline] Use flink's AsyncRetryStrategy instead of custom retry logic
Resolved	pfischer	T347543 [Search Update Pipeline] Fetch: Handle Timeout of AsyncAwaitOperator