Page MenuHomePhabricator

Production images for ORES/revscoring models
Closed, ResolvedPublic

Description

We need to figure out how to build a custom image for our ORES models and deploy them to our k8s cluster.

Open Questions:

TODO:

  • we need to specifiy that the image built in the editquality pipeline is the editquality image (inference-services repo)
  • we also need to only trigger this pipeline when files in the revscoring/editquality directory are updated. (integrations/config repo)

Related Objects

Event Timeline

Mentioned in T272874 by @Ladsgroup

There are ways to reduce its size:

  • Add && rm -rf /var/lib/apt/lists/* after installing python3-pip, this is a common pattern to reduce the size of docker image.
  • Or better, use multi-staged docker images
  • You don't need to install everything if the model doesn't need it. like aspell or hunspell libraries for languages other than English.

We should be able to create multi-stage Dockerfiles using Blubber: https://wikitech.wikimedia.org/wiki/Blubber

I did some experimentation with this in T210268.
There is an abandoned WIP PR on the ORES repo that contained an attempt at writing a Blubberfile: https://github.com/wikimedia/ores/pull/349

@kevinbazira: good news! your revscoring container is now running on the Kubeflow sandbox! The enwiki-goodfaith model is running as a custom inferenceservice via KFServing using the Dockerfile you created.
I am able to hit the service and retrieve a prediction.

Great work on this! Here is a screenshot of me making a prediction in the sandbox cluster:

kfserving-revscoring.png (379×1 px, 77 KB)

A couple of caveats about my approach:

  1. The Kubeflow sandbox is running Kubeflow 1.1, which uses the v1alpha2 API
  2. The model binary is currently packaged inside of the container. This is not ideal, although it works for now while we work on our storage implementation.
  3. The image size is ~1.8GB, which is large, but I'm not too concerned yet. Other ML serving images are similar size (like Pytorch/torchserve is 2.11GB).
  4. The custom KFServer is bare-bones and does not do anything like logging/monitoring/etc... yet. We will need to spend sometime figuring out how to add this our the custom KFServer model.
  5. I had to make some slight tweaks to your Dockerfile (mostly caching layers for a faster build time) and deployed it as a test image on dockerhub for now. We will need to start pushing images to the WMF Docker Registry once the ml-serve prod cluster is live.
  6. Revscoring models can be very memory-hungry, I only provisioned 2GB RAM for this model but it seems to work fine for now. We will want to watch memory usage and also test out the auto-scaling capabilities a bit more.

Going forward, I think we can use your image as a base for all revscoring models and simply pass in a Storage URI for each of the model binaries once we get them in to storage.

Also here are some pastes with my code changes:

  • Custom enwiki-goodfaith KFServing code: P15863
  • Modified Dockerfile: P15862

I was talking with @elukey today and he mentioned that we should begin using base images from the WMF docker registry where we can.
This means, the production version of our generic revscoring image should use the WMF Bullseye image instead of Ubuntu (if possible). I will do some testing today using the Bullseye image and will report back.

Additionally, we may need to create helm charts for our inference services to match the deployment workflow used by SRE. This would actually be a good thing because we could easily reuse existing infrastructure. We are still unsure how KFServing fits into this workflow, so we will need to do a bit more investigation once we get the dependency images (istio, etc.) finished. We may need to discard the inference-services monorepo and opt for separate repos for each group of services, depending on how we envision the mlops deployment process.

Little nit: Bullseye is still not officially released, so our images are based on what upstream offers right now. It seems fine to keep using those, but there is also Buster available in case we find some issue.

Change 698016 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] swap revscoring base image to wmf buster

https://gerrit.wikimedia.org/r/698016

I ran into some issues upgrading the revscoring inference service base image to bullseye (mostly since scipy & numpy have some issues with python3.9 still), so I went with the wmf buster image instead. Things seem to work well so far.

Here are the results from a short perf test using the wmf buster image:

Requests      [total, rate, throughput]         300, 5.02, 4.81
Duration      [total, attack, wait]             1m2s, 59.8s, 2.593s
Latencies     [min, mean, 50, 90, 95, 99, max]  200.287ms, 1.189s, 881.1ms, 2.371s, 2.461s, 2.579s, 2.906s
Bytes In      [total, mean]                     33600, 112.00
Bytes Out     [total, mean]                     6900, 23.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:300

One thing to note is that this change moves us to Python3.7 instead of Python3.8, which is ok for now imo.

Change 698016 merged by Accraze:

[machinelearning/liftwing/inference-services@main] swap revscoring base image to wmf buster

https://gerrit.wikimedia.org/r/698016

Change 702476 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] [WIP] Blubberfile for revscoring model-server

https://gerrit.wikimedia.org/r/702476

Change 702476 merged by Accraze:

[machinelearning/liftwing/inference-services@main] Blubberfile for revscoring editquality model-server

https://gerrit.wikimedia.org/r/702476

Change 705513 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] pin editquality model server base image tag

https://gerrit.wikimedia.org/r/705513

Change 705514 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] limit editquality to only include model-server

https://gerrit.wikimedia.org/r/705514

Change 705516 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] Add editquality pipeline config

https://gerrit.wikimedia.org/r/705516

Change 705513 merged by Accraze:

[machinelearning/liftwing/inference-services@main] pin editquality model server base image tag

https://gerrit.wikimedia.org/r/705513

Change 705514 merged by Accraze:

[machinelearning/liftwing/inference-services@main] limit editquality to only include model-server

https://gerrit.wikimedia.org/r/705514

Change 705516 merged by Accraze:

[machinelearning/liftwing/inference-services@main] Add editquality pipeline config

https://gerrit.wikimedia.org/r/705516

It seems I was thinking about this backwards re: monorepo. I believe we should instead be using PipelineLib to define pipelines for each subdirectory project. Instead of having a .pipeline/ directory in each of our subdirectory projects, we should just have a .pipeline/config.yaml in the root directory that should point to the required blubberfiles for each project-pipeline. Going to do some refactoring on the config today and see if it works.

Change 706050 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] refactor editquality pipeline config

https://gerrit.wikimedia.org/r/706050

Change 706050 merged by Accraze:

[machinelearning/liftwing/inference-services@main] refactor editquality pipeline config

https://gerrit.wikimedia.org/r/706050

Change 708175 had a related patch set uploaded (by Accraze; author: Accraze):

[integration/config@master] inference-services: added editquality pipeline

https://gerrit.wikimedia.org/r/708175

We are one step away from being able to publish our editquality image to the wmf docker registry via the deployment pipeline! I pushed up a patch to the integration/config that should enable the editquality pipeline to test our model server code and then publish the production image.

After that, we should be able to create a helmfile to deploy the enwiki-goodfaith inference service using the editquality image and the service config template.

Change 708352 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] fix incorrect pipeline name

https://gerrit.wikimedia.org/r/708352

Change 708175 merged by jenkins-bot:

[integration/config@master] inference-services: added editquality pipeline

https://gerrit.wikimedia.org/r/708175

Change 708352 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] Fix pipeline configuration

https://gerrit.wikimedia.org/r/708352

We have successfully ran the editquality pipeline and have published our editquality image: https://docker-registry.wikimedia.org//wikimedia/machinelearning-liftwing-inference-services/tags/

Pipeline run: https://integration.wikimedia.org/ci/job/inference-services-pipeline-editquality/

A couple of things I am noticing now:

  • we need to specifiy that the image built in the editquality pipeline is the editquality image (inference-services repo)
  • we also need to only trigger this pipeline when files in the revscoring/editquality directory are updated. (integrations/config repo)

I will update the task description to reflect these last two to-dos

Change 708589 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] update image name for editquality pipeline

https://gerrit.wikimedia.org/r/708589

Change 708596 had a related patch set uploaded (by Accraze; author: Accraze):

[integration/config@master] inference-services: tune editquality pipeline

https://gerrit.wikimedia.org/r/708596

Change 708589 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] update image name for editquality pipeline

https://gerrit.wikimedia.org/r/708589

Change 708596 merged by jenkins-bot:

[integration/config@master] inference-services: tune editquality pipeline

https://gerrit.wikimedia.org/r/708596

It seems like the editquality pipeline is working well. I went ahead and made sub-tasks to track building & configuring similar pipelines for the other revscoring model servers (articlequality, draftquality and topic). We can just follow the same config I used for editquality in this task.

Change 710378 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] use wmf production images for editquality services

https://gerrit.wikimedia.org/r/710378

Change 710379 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] use wmf prod image for articlequality services

https://gerrit.wikimedia.org/r/710379

Change 710378 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] use wmf production images for editquality services

https://gerrit.wikimedia.org/r/710378

Change 710379 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] use wmf prod image for articlequality services

https://gerrit.wikimedia.org/r/710379

Quick update on our progress - We now have model server images published in the WMF Docker Registry for all 4 model classes (editquality, articlequality, draftquality and topic).

One caveat though is that these are 'jumbo' images due to all the extra dependencies required to perform inference using revscoring. We have started a task to reduce the image size using multi-stage builds (see: T290266: Move ML docker images to multi-stage build). @elukey has already trimmed more than 400MB from the editquality image and we should be able to do similar for the remaining model servers.

ACraze closed this task as Resolved.EditedJan 10 2022, 10:09 PM

Closing out this task. We have production images for predictors across all revscoring classes (editquality, articlequality, draftquality, topic) in the WMF Docker Registry.

We are also now creating transformer images to handle feature extraction for those different predictors (and eventually integrate with feature stores). We have completed articlequality-transformer and plan to move on to the other classes next, see T294419 for more details.

Marking this task as RESOLVED