Let's investigate loading a revscoring model into KFServing.
Open Questions:
- What would a base image look like?
- Will we need to load all language assets?
- Do we even need revscoring?
Let's investigate loading a revscoring model into KFServing.
Open Questions:
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | calbon | T272874 Prepare 4 ORES English models for Lift Wing | |||
Resolved | kevinbazira | T279000 Load a revscoring model into KFServing |
As mentioned in T279004, we have successfully deployed the enwiki-goodfaith model as a custom KFServing inference service based off of @kevinbazira's revscoring container image.
From initial testing it seems that we can use the revscoring image as a 'base' image and then inject the model binaries into individual inference services.
Dockerfile: P15862
Here are the results of a simple load test done on our sandbox cluster:
Requests [total, rate, throughput] 3000, 5.00, 5.00 Duration [total, attack, wait] 10m0s, 10m0s, 204.056ms Latencies [min, mean, 50, 90, 95, 99, max] 151.102ms, 230.826ms, 198.789ms, 253.654ms, 349.77ms, 969.064ms, 2.706s Bytes In [total, mean] 336000, 112.00 Bytes Out [total, mean] 69000, 23.00 Success [ratio] 100.00% Status Codes [code:count] 200:3000
Change 690011 had a related patch set uploaded (by Accraze; author: Accraze):
[machinelearning/liftwing/inference-services@main] update predictor name for enwiki-goodfaith
Change 690011 merged by Accraze:
[machinelearning/liftwing/inference-services@main] update predictor name for enwiki-goodfaith
Change 692467 had a related patch set uploaded (by Accraze; author: Accraze):
[machinelearning/liftwing/inference-services@main] move name field to spec.predictor.custom.container
Change 692468 had a related patch set uploaded (by Accraze; author: Accraze):
[machinelearning/liftwing/inference-services@main] add unique name to enwiki-goodfaith load tests
Change 692733 had a related patch set uploaded (by Accraze; author: Accraze):
[machinelearning/liftwing/inference-services@main] disable sidecar injection for enwiki-goodfaith
Change 692467 merged by Accraze:
[machinelearning/liftwing/inference-services@main] move name field to spec.predictor.custom.container
Change 692468 merged by Accraze:
[machinelearning/liftwing/inference-services@main] add unique name to enwiki-goodfaith load tests
@kevinbazira I reviewed your deployed inference service on the KFv1.1 sandbox. So far great progress :)
It looks like there were some issues related to how I have auth setup on this cluster, so I had to make some changes to the files in ~/kevin-deployments/revscoring-inferenceservice.
Now when you go to that directory, there is a shell script called infer.sh (see P16105), which should allow you to hit the deployed model like this:
./infer.sh enwiki-goodfaith.kubeflow-user.example.com * Trying 10.97.188.113... * Connected to 10.97.188.113 (10.97.188.113) port 80 (#0) > POST /v1/models/enwiki-goodfaith:predict HTTP/1.1 > Host: enwiki-goodfaith.kubeflow-user.example.com > User-Agent: curl/7.47.0 > Accept: */* > Cookie: authservice_session=<auth_cookie> > Content-Length: 21 > Content-Type: application/x-www-form-urlencoded > * upload completely sent off: 21 out of 21 bytes < HTTP/1.1 200 OK < content-length: 112 < content-type: application/json; charset=UTF-8 < date: Wed, 19 May 2021 22:00:19 GMT < server: istio-envoy < x-envoy-upstream-service-time: 487 < * Connection #0 to host 10.97.188.113 left intact {"predictions": {"prediction": true, "probability": {"false": 0.02523431512745833, "true": 0.9747656848725417}}}
Things I had to change to get this to run:
I will tear down the service and let you give it a try tomorrow. Here's what I would recommend:
Thank you so much for this explanation @ACraze, it helped fill the gaps I had about how the KFv1.1 sandbox setup runs inference services and returns predictions.
I managed to create a custom revscoring inference service and run it using the shell script with a newly generated authservice_session token.
$ ./infer.sh * Trying 10.97.188.113... * Connected to 10.97.188.113 (10.97.188.113) port 80 (#0) > POST /v1/models/enwiki-goodfaith:predict HTTP/1.1 > Host: enwiki-goodfaith.kubeflow-user.example.com > User-Agent: curl/7.47.0 > Accept: */* > Cookie: authservice_session=<auth_cookie> > Content-Length: 21 > Content-Type: application/x-www-form-urlencoded > * upload completely sent off: 21 out of 21 bytes < HTTP/1.1 200 OK < content-length: 112 < content-type: application/json; charset=UTF-8 < date: Thu, 20 May 2021 11:33:50 GMT < server: istio-envoy < x-envoy-upstream-service-time: 555 < * Connection #0 to host 10.97.188.113 left intact {"predictions": {"prediction": true, "probability": {"false": 0.02523431512745833, "true": 0.9747656848725417}}}
Here are some thoughts and possibly things I'll work on in the coming days:
Change 692733 merged by Accraze:
[machinelearning/liftwing/inference-services@main] disable sidecar injection for enwiki-goodfaith
@kevinbazira awesome! i'm glad you were able to deploy the custom inference service on the sandbox cluster. In response to your thoughts:
Automate the generation of a new authservice_session token. Possibly using a shell script.
Yes this could be helpful. It still remains unclear if we will use the same auth setup in production, however, streamlining this would help us speed up development.
Fix inference service URL because it points to wrong URL.
I'm not sure if we need to fix for now. The url I see follows the format <model-name>.<namespace>.<domain>.
We have the example.com domain assigned because we have not setup DNS for the sandbox cluster (minikube). We also have the local host used inside of the cluster enwiki-goodfaith.kubeflow-user.svc.cluster.local.
You can see more info by describing the service: k describe inferenceservice enwiki-goodfaith -n kubeflow-user
Eventually we'll need to setup dns in production, I'm wondering how we should handle that and if our sandbox should do something similar.
In the event that we go ahead to use one base image to serve multiple models, I wonder how we would be able to inject a model and change the model name defined in the python service to match both the metadata.name and container.name that are specific to each inference service's config. Would this be by using parameters passed via the curl command in infer.sh?
Yeah this is the next big thing we need to tackle. We need to make our enwiki-goodfaith service more generic, such that we can inject a model and also pass in the model name. We are planning to inject the model from swift/s3 (see T282802) by specifying a storageUri field in the service config. We should be able to pass in the name using an environment variable, similar to this example here. I imagine we will eventually have a revcoring directory in our repo with a yaml config (CRDs) for each ORES model.
Once we have created a reproducible workflow, we should possibly document the process of creating and running an inference service including definitions of the files required e.g custom.yaml, input.json, infer.sh, etc.
Yes absolutely. Feel free to add some patches to the inference-services repo or start some docs somewhere. We will want to keep track of our workflow while we develop the first couple of services.
Thanks for the feedback @ACraze. I added authservice-session-generator.sh shell script to enable us easily generate authservice_session cookies on the fly.
This shell script contains:
HOST="https://<KFv1.1 sandbox URL>/"; STATE=$(curl ${HOST} --insecure | grep -oP '(?<=state=)[^ ]*"' | cut -d \" -f1) REQ=$(curl "${HOST}dex/auth?client_id=authservice&redirect_uri=%2Fauthservice%2Foidc%2Fcallback&response_type=code&scope=openid+profile+email+groups&state=${STATE}" --insecure | grep -oP '(?<=req=)\w+') curl "${HOST}dex/auth/local?req=${REQ}" -H 'Content-Type: application/x-www-form-urlencoded' --data 'login=<KFv1.1 sandbox username>&password=<KFv1.1 sandbox password>' --insecure CODE=$(curl "${HOST}dex/approval?req=${REQ}" --insecure | grep -oP '(?<=code=)\w+') curl --cookie-jar - "${HOST}authservice/oidc/callback?code=${CODE}&state=${STATE}" > .dex_session --insecure echo $(tail .dex_session | grep authservice_session | awk '{print $NF}')
I've also added the same code into infer.sh feeding right into the SESSION variable. So we shall just run ./infer.sh without having to go through hoops to generate an authservice_session token every time we need to request a prediction. :)
@kevinbazira excellent work on this! Confirming I am able to use the updated infer.sh script to generate a new session cookie and retrieve a prediction. This is going to save us so much time while developing in the sandbox clusters. Thank you!!
Also confirming that @elukey was able to run a prediction with enwiki-goodfaith on his own minikube instance using some of our own images today (istio etc.).
Going to mark this task as resolved, since we now have three members of the team running the enwiki-goodfaith model as a custom inference service.
I have created T283526: Create generic revscoring inference service for our next phase, where we will investigate if we can use a single custom KFServer image to deploy an inference service for each of the ores models.