Batch request where all requests fail: return a 422 (Unprocessable entity)
Batch request where some requests succeed and others fail: return 207 (Multi-status)
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | achou | T348153 Q3 2024 Goal: Lift Wing users can request multiple predictions using a single request. | |||
Open | achou | T358744 Deploy RR-language-agnostic batch version to prod | |||
Resolved | achou | T360406 Error handling in Batch Predictions for RevertRisk Models |
Event Timeline
Change #1016341 had a related patch set uploaded (by AikoChou; author: AikoChou):
[machinelearning/liftwing/inference-services@main] revertrisk: error handling for batch requests
@kevinbazira posed a question - how can end users switch between batch and non-batch requests?
First to clarify, the batch model can also handle single requests. For example, give this input:
{ "instances": [ { "lang": "en", "rev_id": 123456 } ] }
The main differences between the base model (currently in production) and the batch model (the new one) are:
- The batch model supports multiple predictions in a single request.
- The batch model uses a different input/output schema, required by the Kserve batcher.
Regarding how end users access the batch model, there are three options:
- Replace the current model with the batch model
I think this is the plan when we set up the goal T348153. The concern here is the input/output schema is a breaking change, that could impact downstream applications. Given that the Revert Risk-language agnostic model currently handles production traffic, we would need to notify downstream product owners and provide support as needed. This switch would also introduce some inconsistency among our Lift Wing models, as this model server would be the first one using a different input/output schema.
- Create a new endpoint for the batch model
We could add a new endpoint, such as /v1/models/revertrisk-language-agnostic-batch, and document the changed schema and usage examples on the model card, API Gateway doc, and Lift Wing doc. We would then inform end users about this new endpoint that they can use for requesting multiple predictions. However, this would bring us more maintenance work, as we basically provide two different services for the same model.
- Find a way to support both schemas in one endpoint
We could make the batch model backwards compatible with the current schema for single requests, but this would complicate our code, and the distinction between the base model and the batch model would become blurred, which is not desired. Alternatively, maybe there is a way to redirect batch requests to the batch isvc, which I'm not sure of its feasibility but that would be ideal.
At first I leaned towards the second option to avoid introducing a breaking change to our production service. However, upon further consideration, it seems excessive to create a new endpoint for the batch model.
What do people think about this?
Change #1016341 merged by jenkins-bot:
[machinelearning/liftwing/inference-services@main] revertrisk: error handling for batch requests
Change #1014545 had a related patch set uploaded (by AikoChou; author: AikoChou):
[operations/deployment-charts@master] ml-services: update revertrisk-language-agnostic image
Change #1014545 merged by jenkins-bot:
[operations/deployment-charts@master] ml-services: update revertrisk-language-agnostic image
This task is complete. Check out these examples:
- Batch request where all requests fail: return a 422 (Unprocessable entity)
$ curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-language-agnostic:predict" -d@./input_all_fail.json -H "Host: revertrisk-language-agnostic-batcher.revertrisk.wikimedia.org" --http1.1 -k | jq '.' { "detail": "Could not make prediction for revisions dict_keys([(1, 'ro'), (2, 'ro'), (15925124, 'ro')]). Reason: ['parent_revision_missing', 'revision_missing', 'revision_missing']" }
Kserve's log:
2024-04-04 19:05:25.508 uvicorn.access INFO: 127.0.0.6:0 1 - "POST /v1/models/revertrisk-language-agnostic%3Apredict HTTP/1.1" 422 Unprocessable Entity 2024-04-04 19:05:25.508 kserve.trace kserve.io.kserve.protocol.rest.v1_endpoints.predict: 0.10296297073364258 2024-04-04 19:05:25.508 kserve.trace kserve.io.kserve.protocol.rest.v1_endpoints.predict: 0.011949999999998795
- Batch request where some requests succeed and others fail: return 207 (Multi-status)
$ curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-language-agnostic:predict" -d@./input_some_succeed.json -H "Host: revertrisk-language-agnostic-batcher.revertrisk.wikimedia.org" --http1.1 -k | jq '.' { "predictions": [ { "model_name": "revertrisk-language-agnostic", "model_version": "3", "wiki_db": "rowiki", "revision_id": 15925122, "output": { "prediction": true, "probabilities": { "true": 0.8845043182373047, "false": 0.11549568176269531 } } }, { "model_name": "revertrisk-language-agnostic", "model_version": "3", "wiki_db": "rowiki", "revision_id": 15925123, "output": { "prediction": false, "probabilities": { "true": 0.42537492513656616, "false": 0.5746250748634338 } } }, "Could not make prediction for revision 15925124 (ro). Reason: revision_missing" ] }
Kserve's log:
INFO:root:Getting 3 rev_ids in the request 2024-04-04 19:05:59.205 kserve.trace requestId: 73d3e6da-5a39-49b7-9e78-9cb4c0b7fc05, preprocess_ms: 124.163866043, explain_ms: 0, predict_ms: 14.38331604, postprocess_ms: 0.022649765 2024-04-04 19:05:59.205 uvicorn.access INFO: 127.0.0.6:0 1 - "POST /v1/models/revertrisk-language-agnostic%3Apredict HTTP/1.1" 207 Multi-Status 2024-04-04 19:05:59.205 kserve.trace kserve.io.kserve.protocol.rest.v1_endpoints.predict: 0.13943934440612793 2024-04-04 19:05:59.205 kserve.trace kserve.io.kserve.protocol.rest.v1_endpoints.predict: 0.027026999999996804