Error handling in Batch Predictions for RevertRisk Models
Closed, ResolvedPublic2 Estimated Story Points
Actions

Assigned To

Authored By

	achou
	Mar 19 2024, 9:35 AM

Description

Batch request where all requests fail: return a 422 (Unprocessable entity)
Batch request where some requests succeed and others fail: return 207 (Multi-status)

Details

	Subject	Repo	Branch	Lines +/-
	ml-services: update revertrisk-language-agnostic image	operations/deployment-charts	master	+2 -26
	revertrisk: error handling for batch requests	machinelearning/liftwing/inference-services	main	+84 -67

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	achou	T348153 Q3 2024 Goal: Lift Wing users can request multiple predictions using a single request.
Open	achou	T358744 Deploy RR-language-agnostic batch version to prod
Resolved	achou	T360406 Error handling in Batch Predictions for RevertRisk Models

Event Timeline

achou created this task.Mar 19 2024, 9:35 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 19 2024, 9:35 AM

achou added a parent task: T358744: Deploy RR-language-agnostic batch version to prod.Mar 19 2024, 9:35 AM

calbon updated Other Assignee, added: achou.Mar 19 2024, 2:38 PM

calbon set the point value for this task to 2.

calbon moved this task from Unsorted to In Progress on the Machine-Learning-Team board.

achou updated Other Assignee, removed: achou.Mar 19 2024, 2:56 PM

Change #1016341 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] revertrisk: error handling for batch requests

https://gerrit.wikimedia.org/r/1016341

gerritbot added a project: Patch-For-Review.Apr 2 2024, 1:20 PM

@kevinbazira posed a question - how can end users switch between batch and non-batch requests?

First to clarify, the batch model can also handle single requests. For example, give this input:

{
    "instances": [
      {
        "lang": "en",
        "rev_id": 123456
      }
    ]
}

The main differences between the base model (currently in production) and the batch model (the new one) are:

The batch model supports multiple predictions in a single request.
The batch model uses a different input/output schema, required by the Kserve batcher.

Regarding how end users access the batch model, there are three options:

Replace the current model with the batch model

I think this is the plan when we set up the goal T348153. The concern here is the input/output schema is a breaking change, that could impact downstream applications. Given that the Revert Risk-language agnostic model currently handles production traffic, we would need to notify downstream product owners and provide support as needed. This switch would also introduce some inconsistency among our Lift Wing models, as this model server would be the first one using a different input/output schema.

Create a new endpoint for the batch model

We could add a new endpoint, such as /v1/models/revertrisk-language-agnostic-batch, and document the changed schema and usage examples on the model card, API Gateway doc, and Lift Wing doc. We would then inform end users about this new endpoint that they can use for requesting multiple predictions. However, this would bring us more maintenance work, as we basically provide two different services for the same model.

Find a way to support both schemas in one endpoint

We could make the batch model backwards compatible with the current schema for single requests, but this would complicate our code, and the distinction between the base model and the batch model would become blurred, which is not desired. Alternatively, maybe there is a way to redirect batch requests to the batch isvc, which I'm not sure of its feasibility but that would be ideal.

At first I leaned towards the second option to avoid introducing a breaking change to our production service. However, upon further consideration, it seems excessive to create a new endpoint for the batch model.

What do people think about this?

Change #1016341 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revertrisk: error handling for batch requests

https://gerrit.wikimedia.org/r/1016341

achou mentioned this in rMLIS891bacff86f4: revertrisk: error handling for batch requests.Thu, Apr 4, 9:40 AM

Change #1014545 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update revertrisk-language-agnostic image

https://gerrit.wikimedia.org/r/1014545

Change #1014545 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update revertrisk-language-agnostic image

https://gerrit.wikimedia.org/r/1014545

achou updated the task description. (Show Details)Thu, Apr 4, 6:58 PM

This task is complete. Check out these examples:

Batch request where all requests fail: return a 422 (Unprocessable entity)

$ curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-language-agnostic:predict" -d@./input_all_fail.json -H "Host: revertrisk-language-agnostic-batcher.revertrisk.wikimedia.org" --http1.1 -k |  jq '.'
{
  "detail": "Could not make prediction for revisions dict_keys([(1, 'ro'), (2, 'ro'), (15925124, 'ro')]). Reason: ['parent_revision_missing', 'revision_missing', 'revision_missing']"
}

Kserve's log:

2024-04-04 19:05:25.508 uvicorn.access INFO:     127.0.0.6:0 1 - "POST /v1/models/revertrisk-language-agnostic%3Apredict HTTP/1.1" 422 Unprocessable Entity
2024-04-04 19:05:25.508 kserve.trace kserve.io.kserve.protocol.rest.v1_endpoints.predict: 0.10296297073364258
2024-04-04 19:05:25.508 kserve.trace kserve.io.kserve.protocol.rest.v1_endpoints.predict: 0.011949999999998795

Batch request where some requests succeed and others fail: return 207 (Multi-status)

$ curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-language-agnostic:predict" -d@./input_some_succeed.json -H "Host: revertrisk-language-agnostic-batcher.revertrisk.wikimedia.org" --http1.1 -k |  jq '.'
{
  "predictions": [
    {
      "model_name": "revertrisk-language-agnostic",
      "model_version": "3",
      "wiki_db": "rowiki",
      "revision_id": 15925122,
      "output": {
        "prediction": true,
        "probabilities": {
          "true": 0.8845043182373047,
          "false": 0.11549568176269531
        }
      }
    },
    {
      "model_name": "revertrisk-language-agnostic",
      "model_version": "3",
      "wiki_db": "rowiki",
      "revision_id": 15925123,
      "output": {
        "prediction": false,
        "probabilities": {
          "true": 0.42537492513656616,
          "false": 0.5746250748634338
        }
      }
    },
    "Could not make prediction for revision 15925124 (ro). Reason: revision_missing"
  ]
}

Kserve's log:

INFO:root:Getting 3 rev_ids in the request
2024-04-04 19:05:59.205 kserve.trace requestId: 73d3e6da-5a39-49b7-9e78-9cb4c0b7fc05, preprocess_ms: 124.163866043, explain_ms: 0, predict_ms: 14.38331604, postprocess_ms: 0.022649765
2024-04-04 19:05:59.205 uvicorn.access INFO:     127.0.0.6:0 1 - "POST /v1/models/revertrisk-language-agnostic%3Apredict HTTP/1.1" 207 Multi-Status
2024-04-04 19:05:59.205 kserve.trace kserve.io.kserve.protocol.rest.v1_endpoints.predict: 0.13943934440612793
2024-04-04 19:05:59.205 kserve.trace kserve.io.kserve.protocol.rest.v1_endpoints.predict: 0.027026999999996804

achou mentioned this in T358744: Deploy RR-language-agnostic batch version to prod.Thu, Apr 4, 7:24 PM

achou moved this task from In Progress to 2023-2024 Q4 Done on the Machine-Learning-Team board.Thu, Apr 4, 7:53 PM

Error handling in Batch Predictions for RevertRisk ModelsClosed, ResolvedPublic2 Estimated Story PointsActions

Description

Details

Related ObjectsSearch...

Event Timeline

Error handling in Batch Predictions for RevertRisk Models
Closed, ResolvedPublic2 Estimated Story Points
Actions

Related Objects
Search...