Page MenuHomePhabricator

Custom enwiki-goodfaith Explainer
Open, Needs TriagePublic

Description

KServe provides the ability to attach an Explainer to an Inference Service in order to provide an explanation for a prediction given by an ML model. The explanation can be invoked using the :explain endpoint.

Many of our models at WMF are tree-based, which gives us the advantage of being able to compute feature importance in a fairly straight-forward manner. LIME (Local Interpretable Model-Agnostic Explanations) is an algorithm that can help us do this with tabular data.
LIME: https://arxiv.org/abs/1602.04938
Code: https://github.com/marcotcr/lime

There has been some prior explainability work done for some of our models in the past (see: T196475):
https://github.com/adamwight/ores-lime

Let's try integrating it into a serverless explainer and attach it to an Inference Service.

Event Timeline

Change 761401 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] [WIP]editquality: adding explainer aixserver

https://gerrit.wikimedia.org/r/761401

I managed to get an explainer attached to the enwiki-goodfaith Inference Service on ml-sandbox and was able to retrieve an explanation from the :explain endpoint.

root@ml-sandbox:/srv/home/accraze/isvcs/editquality# time ./explain.sh
enwiki-goodfaith.kserve-test.wikimedia.org
* Expire in 0 ms for 6 (transfer 0x561c7c2bdfb0)
*   Trying 192.168.49.2...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x561c7c2bdfb0)
* Connected to 192.168.49.2 (192.168.49.2) port 31336 (#0)
> POST /v1/models/enwiki-goodfaith:explain HTTP/1.1
> Host: enwiki-goodfaith.kserve-test.wikimedia.org
> User-Agent: curl/7.64.0
> Accept: */*
> Content-Length: 18
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 18 out of 18 bytes
< HTTP/1.1 200 OK
< content-length: 1550
< content-type: application/json; charset=UTF-8
< date: Wed, 09 Feb 2022 17:40:50 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 25506
<
{"explanations": {"goodfaith": [["feature.revision.user.is_patroller=0", -0.015788173175373076], ["feature.log((temporal.revision.user.seconds_since_registration + 1))", 0.0098693949168189], ["feature.revision.user.is_admin=0", -0.008038494524052172], ["feature.revision.user.is_bot=0", -0.006652083636317093], ["feature.revision.comment.has_link=0", -0.004921946378795263], ["feature.revision.page.is_mainspace=1", -0.004483446209442505], ["feature.revision.page.is_articleish=1", -0.0039924316429495385], ["feature.revision.user.is_anon=0", 0.00302822077105626], ["feature.revision.comment.suggests_section_edit=0", -0.002816992577874582], ["feature.revision.user.is_trusted=0", -0.002361741421131755], ["feature.revision.page.is_draftspace=0", -0.002360609276161814], ["feature.log((wikitext.revision.parent.headings + 1))", -0.0014798464706292998], ["feature.log((wikitext.revision.parent.templates + 1))", -0.0014183912073837304], ["feature.revision.user.has_advanced_rights=0", -0.0010041109365421586], ["feature.revis* Connection #0 to host 192.168.49.2 left intact
ion.parent.uppercase_words_per_word", -0.0008045223540967006], ["feature.log((wikitext.revision.parent.external_links + 1))", 0.0007680161102815328], ["feature.log((len(<datasource.tokenized(datasource.revision.parent.text)>) + 1))", 0.0006697338425742979], ["feature.log((wikitext.revision.parent.chars + 1))", 0.0005789536463673155], ["feature.log((len(<datasource.wikitext.revision.parent.uppercase_words>) + 1))", 0.0004919657568360319], ["feature.log((wikitext.revision.parent.ref_tags + 1))", -0.00046450440094183575]]}}
real	0m26.647s
user	0m0.578s
sys	0m0.283s

It seems to work well, you just pass a rev_id to the explainer, the explainer extracts the features and runs the lime explainer on the predictions made for the extracted features and returns an explanation with the feature and their importance.

Challenges:

  • The image is large (just like editquality predictor & transformer), however an explainer will not be used all the time, so we can scale to zero when it's not in use.
  • The explainer needs access to the training data. For now, I just included it directly in the image which works for now since that data is < 200MB.
  • No simple way to have custom explainer talk to custom transformer (maybe a kserve bug?)
    • For now, I just handle preprocess within the explainer (which ultimately means i need to load the model from storage too)