Page MenuHomePhabricator

Investigate Explainer for Revert-Risk model
Open, Needs TriagePublic

Description

KServe provides the ability to attach an Explainer to an Inference Service in order to provide an explanation for a prediction given by an ML model. The explanation can be invoked using the :explain endpoint.

KServe integrates with the Alibi Explainer and the AI Explainability 360 (AIX360) toolkit.

Related links:
https://github.com/kserve/kserve/tree/master/python/aixexplainer
https://github.com/kserve/kserve/tree/master/python/alibiexplainer
https://github.com/kserve/kserve/tree/master/docs/samples/explanation

Event Timeline

Before proceeding with attaching an explainer to revert-risk isvc, we should test the explanation algorithm of interest on statbox to see if the explanation the model returned makes sense to us.

Also, the anchor algorithm is just one candidate, there are a lot of BB (black-box) explanation model Alibi provides, and they have different capabilities and restrictions. For example, anchors may need access to training data. (need more investigation, but if so, it's not ideal for us)

See https://docs.seldon.io/projects/alibi/en/stable/overview/algorithms.html#model-explanations for the overview of current algorithms.

@isarantopoulos mentioned that we can also explore the AI Explainability 360 (AIX360), another explainability open-source library Kserve integrated.

Kserve example: https://kserve.github.io/website/0.10/modelserving/explainer/aix/mnist/aix/

achou renamed this task from Investigate AlibiExplainer for Revert-Risk model to Investigate Explainer for Revert-Risk model.Feb 21 2023, 3:58 PM

Previously, we tested the TreeSHAP algorithm for Multilingual model explainability (from here: https://shap.readthedocs.io/en/latest/). It is supported by the tools provided in the task description. The main benefit is that it works with our classifiers and provides local explainability (so we can have an explanation for each specific sample without any other data needed).

Additionally, the final output user receives for explainability should be discussed according to the use of those values by the final user. Raw SHAP values can be not the best decision. Maybe additional postprocessing (for example, rescaling of values or grouping the scores that evaluate one entity (text, media, user features, etc.)) can be needed.

@isarantopoulos I was able to run the kserve AIX explainer example on ml-sandbox \o/

First I deployed the inferenceservice aix-explainer.yaml (modified the namespace to kserve-test):

aikochou@ml-sandbox:~$ kubectl apply -f aix/aix-explainer.yaml -n kserve-test

Checked the predictor and explainer pod running:

aikochou@ml-sandbox:~$ kubectl get po -n kserve-test
NAME                                                              READY   STATUS    RESTARTS   AGE
aix-explainer-explainer-default-7mhqd-deployment-54569d597mchqc   2/2     Running   0          44m
aix-explainer-predictor-default-5g7rf-deployment-bd878cd74qjmcq   2/2     Running   0          44m

Set some env variables:

MODEL_NAME=aix-explainer
INGRESS_HOST=$(minikube ip)
INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
SERVICE_HOSTNAME=$(kubectl get inferenceservice -n kserve-test ${MODEL_NAME} -o jsonpath='{.status.url}' | cut -d "/" -f 3)

Then I created a venv to install the matplotlib, requests, and scikit-learn, and then run the query_explain.py script:

(venv) aikochou@ml-sandbox:~/aix$ python query_explain.py http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:explain ${SERVICE_HOSTNAME}

Returned results:

************************************************************
************************************************************
************************************************************
starting query
Sending Explain Query
TIME TAKEN:  14.321728706359863
<Response [200]>

Explanation of autos:
['low', 0.011684259246210107]
['government', -0.011531991783027254]
['most', -0.007102600869778604]
['worst', 0.0068011313331184]
['be', -0.006723790631208644]
['cannot', -0.005791967075218682]
['them', 0.005194493737558092]
['If', -0.004526687592100251]
['will', 0.004459571232696949]
['model', 0.0042881826395665]

Explanation of hockey:
...
...

Tree SHAP is a white-box model, meaning we need to load the revert-risk model into the explainer to use it. This makes it like another predictor, as it needs to perform all the tasks that the predictor does, while also requiring a lot of resources. As a result, we don't fully utilize the advantages that Kserve provides for explainers. It would be better to use a black-box model instead.

I've read the following doc: https://kserve.github.io/website/0.10/modelserving/data_plane/v2_protocol/

It seems that upstream suggest to migrate to v2 from v1, but they also add:

Note on changes between V1 & V2

V2 protocol does not currently support the explain endpoint like V1 protocol does. If this is a feature you wish to have in the V2 protocol, please submit a github issue.

No idea when/if we'll have to migrate away from v1, but it is worth to note that the :explain functionality may not be available in the future if we don't ask to upstream :(