Page MenuHomePhabricator

Deploy Outlinks topic model to production
Closed, ResolvedPublic

Description

We have confirmed that we can load the Outlinks topic model as a custom inference service in T276862: Load outlinks topic model in to KFServing . The next step is to deploy the service on Lift Wing. This task serves to track the status of the production deployment.

Related Objects

Event Timeline

ACraze changed the task status from Open to Stalled.Jul 20 2021, 11:02 PM

Marking this as 'Stalled' for now as we are blocked until all dependencies are fully installed on Lift-Wing (for more info see: T272919: Install KFServing standalone)

ACraze changed the task status from Stalled to Open.Dec 6 2021, 9:04 PM
ACraze added a subscriber: elukey.

Removing the 'stalled' status and setting back to 'open' now that the work in T272919 is complete. We have also added deployment pipelines and images to the WMF registry (T290930) and that will allow the model to run on any kserve-enabled cluster.

Lastly, we need to do some configuration work on our helm config in the deployment-charts repo before deploying to the ml-serve cluster.
I will sync with @elukey to figure out next steps.

Removing inactive task assignee

@achou excited to have you claim this task! Don't hesitate to reach out if you have any questions about the model etc.

Hi @Isaac, glad to start working on this. :) I am currently working on a sub-task to complete the HTTP error handling code. After that, we should be ready to deploy it. I will let you know if we see any issues.

@Isaac Outlink topic model has been deployed on Lift Wing! :)

Internal clients (e.g. from stat machines) can access it through an internal discovery endpoint. The following are examples to query the inference service via curl or python script.

First prepare an input.json file:

aikochou@stat1007:~$ cat input.json 
{
    "lang": "en",
    "page_title": "Wings of Fire (novel series)"
}

Query the model via curl:

aikochou@stat1007:~$ curl "https://inference.discovery.wmnet:30443/v1/models/outlink-topic-model:predict" -d @input.json -H "Host: outlink-topic-model.articletopic-outlink.wikimedia.org" --http1.1

You'll see prediction result in the http response:

{"prediction": {"article": "https://en.wikipedia.org/wiki/Wings of Fire (novel series)", "results": [{"topic": "Culture.Literature", "score": 1.0000100135803223}, {"topic": "Culture.Media.Books", "score": 0.9954004287719727}, {"topic": "Culture.Media.Media*", "score": 0.8479777574539185}, {"topic": "Culture.Biography.Women", "score": 0.523430347442627}]}}

Or you can use a python script:

import requests

inference_url = 'https://inference.discovery.wmnet:30443/v1/models/outlink-topic-model:predict'
headers = {
    'Host': 'outlink-topic-model.articletopic-outlink.wikimedia.org',
    'Content-Type': 'application/x-www-form-urlencoded',
}

with open('input.json') as f:
    data = f.read().replace('\n', '')

response = requests.post(inference_url, headers=headers, data=data)

print(response.text)

Run the python script:

aikochou@stat1007:~$ python inference.py
{"prediction": {"article": "https://en.wikipedia.org/wiki/Wings of Fire (novel series)", "results": [{"topic": "Culture.Literature", "score": 1.0000100135803223}, {"topic": "Culture.Media.Books", "score": 0.9954004287719727}, {"topic": "Culture.Media.Media*", "score": 0.8479777574539185}, {"topic": "Culture.Biography.Women", "score": 0.523430347442627}]}}

Please note the Lift Wing discovery endpoint is only suitable for testing or experimental purpose. It hasn't been ready for being integrated to other production services.

Outlink topic model has been deployed on Lift Wing! :)

Eeeeeek! Thank you @achou!!!!
I originally wrote this with multiple exclamation marks on both sentences but that just turned the text pink so apparently I'm not supposed to be thaaaaat excited :)

Internal clients (e.g. from stat machines) can access it through an internal discovery endpoint. The following are examples to query the inference service via curl or python script.

When I tried the above from stat1007 (thanks for the very complete examples), I got the errors below. Do I need special permissions?

  • curl: curl: (56) Received HTTP code 403 from proxy after CONNECT
  • python: requests.exceptions.ProxyError: HTTPSConnectionPool(host='inference.discovery.wmnet', port=30443): Max retries exceeded with url: /v1/models/outlink-topic-model:predict (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden')))

@Isaac I can reproduce the error from stat1004, I think that you are going through the http(s) proxy for a .discovery.wmnet domain (internal one). Try with unset https_proxy, it should work afterwards!

I think that you are going through the http(s) proxy for a .discovery.wmnet domain (internal one). Try with unset https_proxy, it should work afterwards!

Confirmed - thanks @elukey! And I verified the model outputs with a few examples so passing the sniff test!

With the caveat that many of these questions are bigger than this particular model and so should probably move to another task/place (feel free to close this task out and direct me elsewhere), @AikoChou my next questions then are:

  • What's next for this model? Is there a process to go from discovery/experimental -> production-ready? Hal generously created a model card for this model so we have good public-facing documentation for the model itself. Other pieces I can help with?
  • Is there any specific testing of the endpoint that I could do that would be helpful beyond my quick verification?
  • Is predict the only endpoint available or could I e.g., just get the result of the transformation or pass pre-transformed data to the API? In particular, this last one would make the API more useful for batch prediction because I could take advantage of the cluster/hdfs for generating the features efficiently without calling Mediawiki APIs and then use inference.discovery for just the prediction.
  • How feasible is it to connect this model to eventgate? A prime motivation for productionizing this model is to have these topics available for Growth's tools. My understanding is that the ORES articletopic model is connected in via eventgate and the goal is to replace the ORES enwiki model with this language-agnostic model. That, however, does mean you're going from all namspace-0 edits on English Wikipedia to all namespace-0 edits on Wikipedia so it's a scaling up.
  • What's the process right now for updating the model binary? I know TrainingWing doesn't exist yet so I assume it's on me to manually re-train the model but what would be the process for switching in a new model?

Hi @Isaac, thanks for asking these questions. To answer you, I had a conversation with our team.

Is predict the only endpoint available or could I e.g., just get the result of the transformation or pass pre-transformed data to the API? In particular, this last one would make the API more useful for batch prediction because I could take advantage of the cluster/hdfs for generating the features efficiently without calling Mediawiki APIs and then use inference.discovery for just the prediction.

A viable solution for now is that: we'll extend the current API to make it possible to accept pre-transformed data and skip the step for calling Mediawiki APIs for the prediction only. But notice, this will be a short/medium-term solution.

Ideally, for a long-term solution, it would be better for that preprocessing step to be handled somewhere else, so the API only does one thing, which is taking the pre-processed data (features) for prediction. However, we don't have the infrastructure for online/offline feature stores like Feast yet. The long-term plan will be something like you can use a feast client to load the data from hdfs to a feature store, then deploy a model, the model will fetch features from a feast client. This would probably be a best practice of serving a model, but we’ll have a discussion about this after Lift Wing MVP.

How feasible is it to connect this model to eventgate? A prime motivation for productionizing this model is to have these topics available for Growth's tools. My understanding is that the ORES articletopic model is connected in via eventgate and the goal is to replace the ORES enwiki model with this language-agnostic model. That, however, does mean you're going from all namspace-0 edits on English Wikipedia to all namespace-0 edits on Wikipedia so it's a scaling up.

For the past few weeks, we have been working on the idea of having Lift Wing emiting events directly to EventGate (T301878). Currently, all revscoring-based models are now able to accept a revision-create event, generate a revision-score one and send it to EventGate. It's feasible to apply this to outlink topic model as well (I will work on it in the coming weeks).

Also, we still need to work on finding a reasonable way to listen to revision-create events and send those events to Lift Wing API. And indeed, that means we will support more articles on Wikipedia with this language-agnostic model, not only English Wikipedia.

What's the process right now for updating the model binary? I know TrainingWing doesn't exist yet so I assume it's on me to manually re-train the model but what would be the process for switching in a new model?

For training the model, that indeed it must be done on your side for now. For updating the model binary, you could contact me or Kevin, because we need to understand how big the change is or if the new model needs more resources, etc. For most cases, I think it should only be a small change, but it would be nice to have this kind of conversation before deployment.

Other pieces I can help with?

You already did! Your questions here are really helpful and valuable for us to understand your thoughts and needs. It's a lot easier for us to decide on what we wanna work on next. I will be working on your third point (passing pre-transformed data) and fourth point (eventgate). I'll open new tasks for them, so we could have a discussion on details there! :)

A viable solution for now is that: we'll extend the current API to make it possible to accept pre-transformed data and skip the step for calling Mediawiki APIs for the prediction only. But notice, this will be a short/medium-term solution. ... The long-term plan will be something like you can use a feast client to load the data from hdfs to a feature store, then deploy a model, the model will fetch features from a feast client. This would probably be a best practice of serving a model, but we’ll have a discussion about this after Lift Wing MVP.

Current solution: I'll watch for the task and be happy to test it when it's available. Thanks!
Long-term: that all makes sense to me (and feature store would be verrry exciting!). Recognizing this is long-term as you say, feel free to loop me in when those discussions start if that'd be useful.

For the past few weeks, we have been working on the idea of having Lift Wing emiting events directly to EventGate (T301878). Currently, all revscoring-based models are now able to accept a revision-create event, generate a revision-score one and send it to EventGate. It's feasible to apply this to outlink topic model as well (I will work on it in the coming weeks).

Yay - thanks!!

Also, we still need to work on finding a reasonable way to listen to revision-create events and send those events to Lift Wing API. And indeed, that means we will support more articles on Wikipedia with this language-agnostic model, not only English Wikipedia.

Gotcha. This sounds like the major blocker to e.g., Growth using the new outlink predictions in place of the original enwiki ORES articletopic predictions. My understanding based on the EventGate task is that ORES has a working solution but it's hacky so ML Platform is working on building something more sustainable before connecting more models in this way. That seems sensible to me and thanks for taking on that work. I'll follow those conversations but let me know if I can help in any way to provide reasons why this would be useful etc.

For training the model, that indeed it must be done on your side for now. For updating the model binary, you could contact me or Kevin, because we need to understand how big the change is or if the new model needs more resources, etc. For most cases, I think it should only be a small change, but it would be nice to have this kind of conversation before deployment.

Works for me -- I'll probably want to update as we approach having the full EventGate pipeline setup but I'll give a heads up so we can discuss. I doubt I'd change vocab size or dimension size or anything else structural so the model binary should essentially identical in size and no other code changes. Longer term we have some thoughts about reducing the number of classes but even that wouldn't have much of an impact I think on the LiftWing components.

You already did! Your questions here are really helpful and valuable for us to understand your thoughts and needs. It's a lot easier for us to decide on what we wanna work on next. I will be working on your third point (passing pre-transformed data) and fourth point (eventgate). I'll open new tasks for them, so we could have a discussion on details there! :)

Thanks!!

Two actionable items T315994 and T315998 added. I'm going to mark this one completed. :)