We noticed that EventGate is happening in the main process, where failures can crash the job.
- Write tests for possible possible EventGate errors (5xx, timeout)
- Make pipeline robust against these errors. Make this testable.
- Retry with backoff.
Use the same techniques (without retry) to test Prometheus Pushgateway, while we're doing this.