Currently, we have timeouts at multiple levels: HTTP requests (socket write + read) and the AsyncWaitOperator wrapping those HTTP requests:
- TransformOperator
- AsyncWaitOperator
- RetryingAsyncFunction
- BypassingCirrusDocFetcher […] HttpClient
- LagAwareRetryPredicate
- RetryingAsyncFunction
- AsyncWaitOperator
Since the LagAwareRetryPredicate only caps the number of retries for late events, it will retry indefinitely until RetryingAsyncFunction times out. Timing out results in a TimeoutException which, since it does not get handled, crashes the application.
AC
- fetch_error schema: remove restriction on error_type (it only destroys information)
- RetryingAsyncFunction class: override/implement org.wikimedia.discovery.cirrus.updater.common.graph.RetryingAsyncFunction#timeout so it completes with a FetchResult.fromError
- ConsumerApplicationIT verifies output routed to fetch_error stream/topic