We found that when EventGate returns a 503 Service Unavailable error, Outlink returns 500s. Since ChangeProp only retries for 502 and 503 errors, it did not retry in this case. To avoid losing scores, we should change Outlink to return a 503 error in this scenario, which will trigger a retry by ChangeProp.
HTTP 500 in outlink transformer:
File "/opt/lib/python/site-packages/kserve/model.py", line 286, in _http_predict raise HTTPStatusError(message, request=response.request, response=response) httpx.HTTPStatusError: RuntimeError : The event posted to EventGate has been rejected, please contact the ML team if the issue persists., '500 Internal Server Error' for url 'http://outlink-topic-model-predictor-default.articletopic-outlink/v1/models/outlink-topic-model:predict'
In the predictor:
aiohttp.client_exceptions.ClientResponseError: 503, message='Service Unavailable', url=URL('http://eventgate-main.discovery.wmnet:4480/v1/events') RuntimeError: The event posted to EventGate has been rejected, please contact the ML team if the issue persists.