Page MenuHomePhabricator

Load outlinks topic model in to KFServing
Closed, ResolvedPublic

Description

We want to see if we can support a fastText model on Lift Wing. Let's try loading the outlinks topic model as a custom KFServing inference service.

Serving code for Cloud VPS API (Python Flask) that scores an article on demand via the Mediawiki APIs + loaded model binary: https://github.com/wikimedia/research-api-endpoint-template/blob/master/model/wsgi.py

Current model binary: https://analytics.wikimedia.org/published/datasets/one-off/isaacj/articletopic/model_alloutlinks_202012.bin

Event Timeline

I did a first pass at migrating the Outlinks topic model in to KFServing: P14710

Couple of notes so far:

  • OutlinksTopicModel.load loads the model binary
  • OutlinksTopicModel.preprocess gets the all outlinks for a given page and builds the features string
  • OutlinksTopicModel.predict makes the predictions and filters based on threshold cutoff.
  • OutlinksTopicModel.postprocess builds the output dict with article string and results list.

We may want to move the threshold filtering in to the postprocess method so that the predict method is more generalized.

There may also be some things we need to change once we deploy to Lift Wing (like how we load the model binary and handle param values like thresholds, logging, etc.). Since it is a custom KFServer, we will probably want to do some load testing as well.

FYI some context on fastText and why I use it: in my experience, fastText is way way faster to train than any other library I've tried (without needing GPUs) and perhaps more importantly very robust to poorly-chosen hyperparameters. For other use cases, we've tried gradient-boosted classifiers in sklearn and they achieve similar performance but take way way longer to train and do grid search to find the right hyperparameters. When I've attempted to recreate fastText models via Keras, the performance has always been substantially lower though in theory the right set of hyperparameters should lead to similar performance (see T242013#5803872). The downside of course is that fastText is highly-optimized code that cannot be easily changed / adapted if it doesn't exactly meet your needs. For prediction with an already-trained model, I use the fastText library because it's easiest but it's possible to extract all the embeddings/weights from a model and generate predictions with just numpy as a dependency if that would make things easier (T242013#6155316).

@Isaac thank you for providing more context about fastText. I did some initial work on loading your model into KFServing last week and I am not anticipating any major issues so far. It does seem like the right tool for the job here and I think we will be able to get decent performance once we start load testing on the new infrastructure in the near future.

This is a good lesson for us about adding custom model library services into Lift Wing. That said, I suspec fastText is the easy case because it is so well supported generally.

Thanks @ACraze and @calbon! It's very relieving to hear that fastText will be supported. It's my logistic regression (literally and figuratively) :)

Confirming that the Outlinks topic model can indeed be loaded as a custom KFServing inference service to be used by Lift-Wing .
I was able to package and deploy the model inside our Kubeflow sandbox today.

Here is a screenshot of the model service showing up in the UI:

kubeflow-ui-outlink.png (410×1 px, 58 KB)

Also, here is a screenshot of me hitting the API via curl on our sandbox cluster:

outlink-kfserving.png (396×1 px, 92 KB)

I was able to incorporate transformers by using a pre-processor to get the outlinks and generate the feature string, as well as use a post-processor to gather the set of "up-to-limit" outlinks.

@Isaac, two questions for you:

  1. Are there only two required parameters (lang and page_title)?
  2. Also, does this output look correct? (with input {"lang": "en", "page_title": "Toni Morrison"}):
{"prediction": {"article": "https://en.wikipedia.org/wiki/Toni Morrison", "results": [{"topic": "Culture.Biography.Biography*", "score": 0.9626831412315369}, {"topic": "Culture.Literature", "score": 0.6654205918312073}, {"topic": "Geography.Regions.Americas.North_America", "score": 0.607673168182373}]}}

Yay!

Are the only two required parameters are lang and page_title?

There should be three (also threshold -- see below). Regarding lang and page_title though: as it is currently setup, the model is providing a prediction based off of the current revision of a Wikipedia article. This is because it uses the pagelinks table (which is only ever the current state). If needed, we could extend it to extract links from old revisions of a page using the wikitext though it'd likely be imperfect as it wouldn't gather links inserted via templates (which are a pretty large proportion of links for many stub articles) and would just be more computationally/API-intensive. With that in mind, I can imagine four options:

  • Current version of article only:
    • lang + page_title: current behavior as you point out. The reason I went with it is that it's easiest for people to play with the API. But for production, page_title is not great because the API follows redirects and therefore sometimes the results are not for the given page_title but for the page it points to (which can be confusing).
    • lang + page_id: ideal behavior from research perspective because page_id is nice and stable. Easy to update the API to support this and I'm happy to provide that code.
    • lang + QID: probably not necessary but technically an option. The Wikidata ID would just be mapped to a page ID or title then before gathering links etc.
  • Any version of the article:
    • lang + revid: how most of ORES works and allows for processing historical revisions, which is nice. It would be a larger lift though because it would require fetching wikitext, processing the wikitext, and then hitting the APIs again for Wikidata IDs associated with the links. So this approach would almost certainly have higher latency because it would process more data and make more API calls than just gathering the pagelinks directly AND it would be incomplete because it would only parse links that are found in the wikitext. But it obviously greatly expands the capabilities of the API.

Also, does this output look correct? (with input {"lang": "en", "page_title": "Toni Morrison"}):

So looks like this output is with the threshold set to 0.5 -- i.e. the model only is returning topics whose confidence is above 0.5. This should be configurable though in the API call so it should accept a threshold parameter that is a float that can be set from [0-1] and defaults to 0.5.

Change 690047 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] add threshold param to outlinks request

https://gerrit.wikimedia.org/r/690047

Thanks @Isaac, I see that reflected in the code now, but didn't have threshold documented with the other params. I've added a patch for that in gerrit.

The next step is to run some basic load tests on the prediction service. I'm currently working on configuring a job for that and will run it a bit later this week.

ACraze renamed this task from Load a fastText model in to KFServing to Load outlinks topic model model in to KFServing.May 12 2021, 10:13 PM
ACraze claimed this task.
ACraze updated the task description. (Show Details)

Change 691252 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] add outlink load test job configs

https://gerrit.wikimedia.org/r/691252

Change 690047 merged by Accraze:

[machinelearning/liftwing/inference-services@main] add threshold param to outlinks request

https://gerrit.wikimedia.org/r/690047

Change 691252 merged by Accraze:

[machinelearning/liftwing/inference-services@main] add outlink load test job configs

https://gerrit.wikimedia.org/r/691252

Quick update: I've been doing some testing over the past couple of days and have noticed a timeout issue when testing high throughput loads (like 50-100 calls per second). I traced it down to when we are retrieving all the outlinks via mwapi.Session. After ~100 calls, the outlinks eventually get returned as None: https://github.com/wikimedia/machinelearning-liftwing-inference-services/blob/main/outlink-topic-model/model-server/model.py#L105

For some reason it seems to hang there, the remaining curl calls do not get a response. At first I thought the inference service was running out of memory, but increasing ram and cpu did not change anything.

On a good note, the auto-scaling feature works like a charm :)

I will continue digging into this tomorrow.

Change 694718 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] disable sidecar injection for outlink-topic-model

https://gerrit.wikimedia.org/r/694718

Change 694735 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] move outlink model binary to external storage

https://gerrit.wikimedia.org/r/694735

ACraze renamed this task from Load outlinks topic model model in to KFServing to Load outlinks topic model in to KFServing.May 25 2021, 10:28 PM

Change 694756 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] create external transformer for outlink model

https://gerrit.wikimedia.org/r/694756

Change 694718 merged by Accraze:

[machinelearning/liftwing/inference-services@main] disable sidecar injection for outlink-topic-model

https://gerrit.wikimedia.org/r/694718

Change 694735 merged by Accraze:

[machinelearning/liftwing/inference-services@main] move outlink model binary to external storage

https://gerrit.wikimedia.org/r/694735

Change 694756 merged by Accraze:

[machinelearning/liftwing/inference-services@main] create external transformer for outlink model

https://gerrit.wikimedia.org/r/694756

Change 699267 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] swap outlink base image to wmf bullseye

https://gerrit.wikimedia.org/r/699267

Change 699312 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] swap outlink transformer base image to wmf python3

https://gerrit.wikimedia.org/r/699312

Change 699267 merged by Accraze:

[machinelearning/liftwing/inference-services@main] swap outlink base image to wmf bullseye

https://gerrit.wikimedia.org/r/699267

Change 699312 merged by Accraze:

[machinelearning/liftwing/inference-services@main] swap outlink transformer base image to wmf python3

https://gerrit.wikimedia.org/r/699312

Hey all, quick update here -- confirming that the outlinks topic model seems to be stable and performant when run as a custom inference service on our sandbox clusters. I'm going to mark this task as resolved. I also added a parent task (T287056) to track the production deployment on Lift Wing.