Page MenuHomePhabricator

Error handling in Batch Predictions for RevertRisk Models
Closed, ResolvedPublic2 Estimated Story Points

Description

Batch request where all requests fail: return a 422 (Unprocessable entity)
Batch request where some requests succeed and others fail: return 207 (Multi-status)

Event Timeline

calbon set the point value for this task to 2.
calbon moved this task from Unsorted to In Progress on the Machine-Learning-Team board.

Change #1016341 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] revertrisk: error handling for batch requests

https://gerrit.wikimedia.org/r/1016341

@kevinbazira posed a question - how can end users switch between batch and non-batch requests?

First to clarify, the batch model can also handle single requests. For example, give this input:

{
    "instances": [
      {
        "lang": "en",
        "rev_id": 123456
      }
    ]
}

The main differences between the base model (currently in production) and the batch model (the new one) are:

  • The batch model supports multiple predictions in a single request.
  • The batch model uses a different input/output schema, required by the Kserve batcher.

Regarding how end users access the batch model, there are three options:

  1. Replace the current model with the batch model

I think this is the plan when we set up the goal T348153. The concern here is the input/output schema is a breaking change, that could impact downstream applications. Given that the Revert Risk-language agnostic model currently handles production traffic, we would need to notify downstream product owners and provide support as needed. This switch would also introduce some inconsistency among our Lift Wing models, as this model server would be the first one using a different input/output schema.

  1. Create a new endpoint for the batch model

We could add a new endpoint, such as /v1/models/revertrisk-language-agnostic-batch, and document the changed schema and usage examples on the model card, API Gateway doc, and Lift Wing doc. We would then inform end users about this new endpoint that they can use for requesting multiple predictions. However, this would bring us more maintenance work, as we basically provide two different services for the same model.

  1. Find a way to support both schemas in one endpoint

We could make the batch model backwards compatible with the current schema for single requests, but this would complicate our code, and the distinction between the base model and the batch model would become blurred, which is not desired. Alternatively, maybe there is a way to redirect batch requests to the batch isvc, which I'm not sure of its feasibility but that would be ideal.

At first I leaned towards the second option to avoid introducing a breaking change to our production service. However, upon further consideration, it seems excessive to create a new endpoint for the batch model.

What do people think about this?

Change #1016341 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revertrisk: error handling for batch requests

https://gerrit.wikimedia.org/r/1016341

Change #1014545 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update revertrisk-language-agnostic image

https://gerrit.wikimedia.org/r/1014545

Change #1014545 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update revertrisk-language-agnostic image

https://gerrit.wikimedia.org/r/1014545

This task is complete. Check out these examples:

  • Batch request where all requests fail: return a 422 (Unprocessable entity)
$ curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-language-agnostic:predict" -d@./input_all_fail.json -H "Host: revertrisk-language-agnostic-batcher.revertrisk.wikimedia.org" --http1.1 -k |  jq '.'
{
  "detail": "Could not make prediction for revisions dict_keys([(1, 'ro'), (2, 'ro'), (15925124, 'ro')]). Reason: ['parent_revision_missing', 'revision_missing', 'revision_missing']"
}

Kserve's log:

2024-04-04 19:05:25.508 uvicorn.access INFO:     127.0.0.6:0 1 - "POST /v1/models/revertrisk-language-agnostic%3Apredict HTTP/1.1" 422 Unprocessable Entity
2024-04-04 19:05:25.508 kserve.trace kserve.io.kserve.protocol.rest.v1_endpoints.predict: 0.10296297073364258
2024-04-04 19:05:25.508 kserve.trace kserve.io.kserve.protocol.rest.v1_endpoints.predict: 0.011949999999998795
  • Batch request where some requests succeed and others fail: return 207 (Multi-status)
$ curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-language-agnostic:predict" -d@./input_some_succeed.json -H "Host: revertrisk-language-agnostic-batcher.revertrisk.wikimedia.org" --http1.1 -k |  jq '.'
{
  "predictions": [
    {
      "model_name": "revertrisk-language-agnostic",
      "model_version": "3",
      "wiki_db": "rowiki",
      "revision_id": 15925122,
      "output": {
        "prediction": true,
        "probabilities": {
          "true": 0.8845043182373047,
          "false": 0.11549568176269531
        }
      }
    },
    {
      "model_name": "revertrisk-language-agnostic",
      "model_version": "3",
      "wiki_db": "rowiki",
      "revision_id": 15925123,
      "output": {
        "prediction": false,
        "probabilities": {
          "true": 0.42537492513656616,
          "false": 0.5746250748634338
        }
      }
    },
    "Could not make prediction for revision 15925124 (ro). Reason: revision_missing"
  ]
}

Kserve's log:

INFO:root:Getting 3 rev_ids in the request
2024-04-04 19:05:59.205 kserve.trace requestId: 73d3e6da-5a39-49b7-9e78-9cb4c0b7fc05, preprocess_ms: 124.163866043, explain_ms: 0, predict_ms: 14.38331604, postprocess_ms: 0.022649765
2024-04-04 19:05:59.205 uvicorn.access INFO:     127.0.0.6:0 1 - "POST /v1/models/revertrisk-language-agnostic%3Apredict HTTP/1.1" 207 Multi-Status
2024-04-04 19:05:59.205 kserve.trace kserve.io.kserve.protocol.rest.v1_endpoints.predict: 0.13943934440612793
2024-04-04 19:05:59.205 kserve.trace kserve.io.kserve.protocol.rest.v1_endpoints.predict: 0.027026999999996804