Page MenuHomePhabricator

Load a revscoring model into KFServing
Closed, ResolvedPublic

Description

Let's investigate loading a revscoring model into KFServing.

Open Questions:

  • What would a base image look like?
  • Will we need to load all language assets?
  • Do we even need revscoring?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

As mentioned in T279004, we have successfully deployed the enwiki-goodfaith model as a custom KFServing inference service based off of @kevinbazira's revscoring container image.
From initial testing it seems that we can use the revscoring image as a 'base' image and then inject the model binaries into individual inference services.

Dockerfile: P15862

Here are the results of a simple load test done on our sandbox cluster:

Requests      [total, rate, throughput]         3000, 5.00, 5.00
Duration      [total, attack, wait]             10m0s, 10m0s, 204.056ms
Latencies     [min, mean, 50, 90, 95, 99, max]  151.102ms, 230.826ms, 198.789ms, 253.654ms, 349.77ms, 969.064ms, 2.706s
Bytes In      [total, mean]                     336000, 112.00
Bytes Out     [total, mean]                     69000, 23.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:3000

Change 690011 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] update predictor name for enwiki-goodfaith

https://gerrit.wikimedia.org/r/690011

Change 690011 merged by Accraze:

[machinelearning/liftwing/inference-services@main] update predictor name for enwiki-goodfaith

https://gerrit.wikimedia.org/r/690011

Change 692467 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] move name field to spec.predictor.custom.container

https://gerrit.wikimedia.org/r/692467

Change 692468 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] add unique name to enwiki-goodfaith load tests

https://gerrit.wikimedia.org/r/692468

Change 692733 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] disable sidecar injection for enwiki-goodfaith

https://gerrit.wikimedia.org/r/692733

Change 692467 merged by Accraze:

[machinelearning/liftwing/inference-services@main] move name field to spec.predictor.custom.container

https://gerrit.wikimedia.org/r/692467

Change 692468 merged by Accraze:

[machinelearning/liftwing/inference-services@main] add unique name to enwiki-goodfaith load tests

https://gerrit.wikimedia.org/r/692468

@kevinbazira I reviewed your deployed inference service on the KFv1.1 sandbox. So far great progress :)

It looks like there were some issues related to how I have auth setup on this cluster, so I had to make some changes to the files in ~/kevin-deployments/revscoring-inferenceservice.

Now when you go to that directory, there is a shell script called infer.sh (see P16105), which should allow you to hit the deployed model like this:

 ./infer.sh 
enwiki-goodfaith.kubeflow-user.example.com
*   Trying 10.97.188.113...
* Connected to 10.97.188.113 (10.97.188.113) port 80 (#0)
> POST /v1/models/enwiki-goodfaith:predict HTTP/1.1
> Host: enwiki-goodfaith.kubeflow-user.example.com
> User-Agent: curl/7.47.0
> Accept: */*
> Cookie: authservice_session=<auth_cookie>
> Content-Length: 21
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 21 out of 21 bytes
< HTTP/1.1 200 OK
< content-length: 112
< content-type: application/json; charset=UTF-8
< date: Wed, 19 May 2021 22:00:19 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 487
< 
* Connection #0 to host 10.97.188.113 left intact
{"predictions": {"prediction": true, "probability": {"false": 0.02523431512745833, "true": 0.9747656848725417}}}

Things I had to change to get this to run:

  1. I had to redeploy the service to the kubeflow-user namespace: kubectl apply -f custom.yaml -n kubeflow-user. This is the namespace that our install looks for the model services.
  2. In custom.yaml, we need to make sure the metadata.name and container.name match the name of the model defined in the python service: https://github.com/wikimedia/machinelearning-liftwing-inference-services/blob/main/enwiki-goodfaith/model-server/model.py#L32
  3. The rev_id value in input.json was a list when it should just be an int
  4. I needed to disable the Istio sidecar injection which was adding an additional auth layer to the request. I made a patchset to fix this: https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/692733

I will tear down the service and let you give it a try tomorrow. Here's what I would recommend:

  1. Deploy the custom.yaml to the kubeflow-user namespace
  2. Try out the infer.sh script and see if you can hit the service and get a prediction

Thank you so much for this explanation @ACraze, it helped fill the gaps I had about how the KFv1.1 sandbox setup runs inference services and returns predictions.

I managed to create a custom revscoring inference service and run it using the shell script with a newly generated authservice_session token.

$ ./infer.sh 
*   Trying 10.97.188.113...
* Connected to 10.97.188.113 (10.97.188.113) port 80 (#0)
> POST /v1/models/enwiki-goodfaith:predict HTTP/1.1
> Host: enwiki-goodfaith.kubeflow-user.example.com
> User-Agent: curl/7.47.0
> Accept: */*
> Cookie: authservice_session=<auth_cookie>
> Content-Length: 21
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 21 out of 21 bytes
< HTTP/1.1 200 OK
< content-length: 112
< content-type: application/json; charset=UTF-8
< date: Thu, 20 May 2021 11:33:50 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 555
< 
* Connection #0 to host 10.97.188.113 left intact
{"predictions": {"prediction": true, "probability": {"false": 0.02523431512745833, "true": 0.9747656848725417}}}

Here are some thoughts and possibly things I'll work on in the coming days:

  1. Automate the generation of a new authservice_session token. Possibly using a shell script.
  2. Fix inference service URL because it points to wrong URL. To see the URL run $ kubectl get inferenceservice -n kubeflow-user.
  3. In the event that we go ahead to use one base image to serve multiple models, I wonder how we would be able to inject a model and change the model name defined in the python service to match both the metadata.name and container.name that are specific to each inference service's config. Would this be by using parameters passed via the curl command in infer.sh?
  4. Once we have created a reproducible workflow, we should possibly document the process of creating and running an inference service including definitions of the files required e.g custom.yaml, input.json, infer.sh, etc.

Change 692733 merged by Accraze:

[machinelearning/liftwing/inference-services@main] disable sidecar injection for enwiki-goodfaith

https://gerrit.wikimedia.org/r/692733

@kevinbazira awesome! i'm glad you were able to deploy the custom inference service on the sandbox cluster. In response to your thoughts:

Automate the generation of a new authservice_session token. Possibly using a shell script.

Yes this could be helpful. It still remains unclear if we will use the same auth setup in production, however, streamlining this would help us speed up development.

Fix inference service URL because it points to wrong URL.

I'm not sure if we need to fix for now. The url I see follows the format <model-name>.<namespace>.<domain>.
We have the example.com domain assigned because we have not setup DNS for the sandbox cluster (minikube). We also have the local host used inside of the cluster enwiki-goodfaith.kubeflow-user.svc.cluster.local.
You can see more info by describing the service: k describe inferenceservice enwiki-goodfaith -n kubeflow-user
Eventually we'll need to setup dns in production, I'm wondering how we should handle that and if our sandbox should do something similar.

In the event that we go ahead to use one base image to serve multiple models, I wonder how we would be able to inject a model and change the model name defined in the python service to match both the metadata.name and container.name that are specific to each inference service's config. Would this be by using parameters passed via the curl command in infer.sh?

Yeah this is the next big thing we need to tackle. We need to make our enwiki-goodfaith service more generic, such that we can inject a model and also pass in the model name. We are planning to inject the model from swift/s3 (see T282802) by specifying a storageUri field in the service config. We should be able to pass in the name using an environment variable, similar to this example here. I imagine we will eventually have a revcoring directory in our repo with a yaml config (CRDs) for each ORES model.

Once we have created a reproducible workflow, we should possibly document the process of creating and running an inference service including definitions of the files required e.g custom.yaml, input.json, infer.sh, etc.

Yes absolutely. Feel free to add some patches to the inference-services repo or start some docs somewhere. We will want to keep track of our workflow while we develop the first couple of services.

Thanks for the feedback @ACraze. I added authservice-session-generator.sh shell script to enable us easily generate authservice_session cookies on the fly.

This shell script contains:

HOST="https://<KFv1.1 sandbox URL>/";
STATE=$(curl ${HOST} --insecure | grep -oP '(?<=state=)[^ ]*"' | cut -d \" -f1)
REQ=$(curl "${HOST}dex/auth?client_id=authservice&redirect_uri=%2Fauthservice%2Foidc%2Fcallback&response_type=code&scope=openid+profile+email+groups&state=${STATE}" --insecure | grep -oP '(?<=req=)\w+')
curl "${HOST}dex/auth/local?req=${REQ}" -H 'Content-Type: application/x-www-form-urlencoded' --data 'login=<KFv1.1 sandbox username>&password=<KFv1.1 sandbox password>' --insecure
CODE=$(curl "${HOST}dex/approval?req=${REQ}" --insecure | grep -oP '(?<=code=)\w+')
curl --cookie-jar - "${HOST}authservice/oidc/callback?code=${CODE}&state=${STATE}" > .dex_session --insecure
echo $(tail .dex_session | grep authservice_session | awk '{print $NF}')

I've also added the same code into infer.sh feeding right into the SESSION variable. So we shall just run ./infer.sh without having to go through hoops to generate an authservice_session token every time we need to request a prediction. :)

@kevinbazira excellent work on this! Confirming I am able to use the updated infer.sh script to generate a new session cookie and retrieve a prediction. This is going to save us so much time while developing in the sandbox clusters. Thank you!!

Also confirming that @elukey was able to run a prediction with enwiki-goodfaith on his own minikube instance using some of our own images today (istio etc.).

Going to mark this task as resolved, since we now have three members of the team running the enwiki-goodfaith model as a custom inference service.

I have created T283526: Create generic revscoring inference service for our next phase, where we will investigate if we can use a single custom KFServer image to deploy an inference service for each of the ores models.