Page MenuHomePhabricator

Add enwiki-articlequality inference service to LiftWing
Closed, ResolvedPublic

Description

Following the LiftWing inference service deployment documentation, add enwiki-articlequality inference service to LiftWing.

Event Timeline

Step One: Upload model binary to Thanos Swift - s3://wmf-ml-models/articlequality/enwiki/20211022183902/model.bin

I did this on ml-serve with a modified version of the model upload script: https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/719668/3/utils/model_upload.sh

Change 733034 had a related patch set uploaded (by Accraze; author: Accraze):

[operations/deployment-charts@master] ml-services: add enwiki-articlequality

https://gerrit.wikimedia.org/r/733034

Change 734063 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] kubernetes: add revscoring-articlequality to ml-serve

https://gerrit.wikimedia.org/r/734063

Change 734066 had a related patch set uploaded (by Elukey; author: Elukey):

[labs/private@master] kubernetes: add tokens and secrets for revscoring-articlequality

https://gerrit.wikimedia.org/r/734066

Change 734067 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] Add the revscoring-articlequality ns to ml-serve clusters

https://gerrit.wikimedia.org/r/734067

Change 734063 merged by Elukey:

[operations/puppet@production] kubernetes: add revscoring-articlequality to ml-serve

https://gerrit.wikimedia.org/r/734063

Change 734067 merged by Elukey:

[operations/deployment-charts@master] Add the revscoring-articlequality ns to ml-serve clusters

https://gerrit.wikimedia.org/r/734067

Change 733034 merged by Elukey:

[operations/deployment-charts@master] ml-services: add enwiki-articlequality

https://gerrit.wikimedia.org/r/733034

Change 734432 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] articlequality: migrate kfserving dep to kserve

https://gerrit.wikimedia.org/r/734432

Change 734559 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] helmfile.d: test another STORAGE_URI for revscoring-articlequality

https://gerrit.wikimedia.org/r/734559

Change 734559 merged by Elukey:

[operations/deployment-charts@master] helmfile.d: test another STORAGE_URI for revscoring-articlequality

https://gerrit.wikimedia.org/r/734559

Change 734066 merged by Elukey:

[labs/private@master] kubernetes: add tokens and secrets for revscoring-articlequality

https://gerrit.wikimedia.org/r/734066

Change 734432 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] articlequality: migrate kfserving dep to kserve

https://gerrit.wikimedia.org/r/734432

To reach feature parity with ORES, we have added a pre-processing transformer that pulls down article text from the MW api. The changes required to the service yaml config inclue a new entry in the spec:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: enwiki-articlequality
  annotations:
    sidecar.istio.io/inject: "false"
spec:
  transformer:
    containers:
      - image: docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-articlequality-transformer:2021-11-29-164759-production
        name: user-container
        env:
          - name: WIKI_URL
            value: "https://en.wikipedia.org"
  predictor:
    serviceAccountName: sa
    containers:
      - name: kfserving-container
        image: docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-articlequality:2021-08-05-222549-production
        env:
          # TODO: https://phabricator.wikimedia.org/T284091
          - name: STORAGE_URI
            value: "s3://wmf-ml-models/articlequality/enwiki/wp10/202105271538/"
          - name: INFERENCE_NAME
            value: "enwiki-articlequality"

Notice that there is a new section called transformer that has a single container (user-container). The entire inference flow will look like request->transformer->predictor->response, we will need to ensure that the cluster-local-gateway is configured on ml-serve so that the transformer can communicate with the predictor inside our cluster. This may require updating our helm config in the deployment-charts repo.

Change 748147 had a related patch set uploaded (by Accraze; author: Accraze):

[operations/deployment-charts@master] ml-services: add articlequality transformer

https://gerrit.wikimedia.org/r/748147

Change 748147 merged by Elukey:

[operations/deployment-charts@master] ml-services: add articlequality transformer

https://gerrit.wikimedia.org/r/748147

Change 748181 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] articlequality: add wmf-certificates to image

https://gerrit.wikimedia.org/r/748181

Change 748181 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] articlequality: add wmf-certificates to image

https://gerrit.wikimedia.org/r/748181

Change 748185 had a related patch set uploaded (by Accraze; author: Accraze):

[operations/deployment-charts@master] articlequality: update transformer image

https://gerrit.wikimedia.org/r/748185

Change 748185 merged by Elukey:

[operations/deployment-charts@master] articlequality: update transformer image

https://gerrit.wikimedia.org/r/748185

ACraze claimed this task.

Confirming that we were able to run the full transformer->predictor flow for articlequality on ml-serve today. Marking this as RESOLVED

Change 759456 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: add model STORAGE_URI to enwiki-articlequality transformer

https://gerrit.wikimedia.org/r/759456

Change 759456 merged by Elukey:

[operations/deployment-charts@master] ml-services: add model STORAGE_URI to enwiki-articlequality transformer

https://gerrit.wikimedia.org/r/759456