Page MenuHomePhabricator

Configure SSE Client retries and timeouts to improve reliability
Closed, ResolvedPublic

Description

As discussed in T250084 this is intended to be a quick fix alternative to T261807

The SSE Client can pass arguments through to the underlying requests library, which supports connect and read timeouts. The default behavior is *no timeout*.
https://requests.readthedocs.io/en/latest/user/advanced/#timeouts

So we should be able to configure connect and read timeouts.

We'll configure retries too.

Acceptance criteria

events keep getting added to our database without a human intervening to restart the eventstream container

Event Timeline

This is as ready and tested as it can be without resolving a non-production environment issue first.

deployed and running strong in production for a few days now.

Could we investigate "running strong" a little? It would be great to check, for example, that we don't have any suspicious data gaps from downtime. Perhaps we do that nearer the end of the week so we have more data.