Page MenuHomePhabricator

Outlink returns 500 when EventGate returns 503 Service Unavailable
Open, MediumPublic2 Estimated Story Points

Description

We found that when EventGate returns a 503 Service Unavailable error, Outlink returns 500s. Since ChangeProp only retries for 502 and 503 errors, it did not retry in this case. To avoid losing scores, we should change Outlink to return a 503 error in this scenario, which will trigger a retry by ChangeProp.

HTTP 500 in outlink transformer:

  File "/opt/lib/python/site-packages/kserve/model.py", line 286, in _http_predict
    raise HTTPStatusError(message, request=response.request, response=response)
httpx.HTTPStatusError: RuntimeError : The event posted to EventGate has been rejected, please contact the ML team if the issue persists., '500 Internal Server Error' for url 'http://outlink-topic-model-predictor-default.articletopic-outlink/v1/models/outlink-topic-model:predict'

In the predictor:

aiohttp.client_exceptions.ClientResponseError: 503, message='Service Unavailable', url=URL('http://eventgate-main.discovery.wmnet:4480/v1/events')
RuntimeError: The event posted to EventGate has been rejected, please contact the ML team if the issue persists.

Event Timeline

Change 961347 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] events.py: set the same error code when sending events to eventgate

https://gerrit.wikimedia.org/r/961347

calbon set the point value for this task to 2.Nov 2 2023, 7:00 PM
calbon moved this task from In Progress to Ready To Go on the Machine-Learning-Team board.
achou triaged this task as Low priority.Nov 2 2023, 7:23 PM
achou raised the priority of this task from Low to Medium.Dec 5 2023, 3:52 PM