Page MenuHomePhabricator

Factor out feature retrieve functionality to a transformer
Closed, ResolvedPublic

Description

I had a really nice chat with @Theofpa and among the topics discussed, it came up that our way to fetch features inside the predictor could be factored out to a transformer container.

For example, in our predictors, we do:

  1. get the article/change to score via input feature
  2. fetch its content from the mediawiki api
  3. get the score from the model
  4. return the http response

Another approach could be:

  1. the client contacts the transformer first (that is part of the same pod as the predictor, but running in a different container)
  2. the transformer fetches from the mediawiki api
  3. the tranformer sends the input feature plus the extra data from the mw api to the predictor
  4. the score is generated and returned to the client

The transformer in this case is very simple, but it can potentially run anything, for example fetching from the online feature store (using Feast's python client etc..).

Let's try to investigate if this is feasible for us, and what it needs to be done in case to implement it.

Event Timeline

@kevinbazira & I discussed doing something like this to fetch article text for articlequality models in T285909 in order to get feature parity with ORES, however, I think we could do this as a general purpose feature extractor for all the other revscoring models. This could also have the added benefit of shrinking our model images even more (hopefully)

We can close this task and use T285909, lemme know what you prefer :)

@elukey - let's just merge the older task with this one, since this task is more fleshed out.

One thing we should consider is that a generic revscoring transformer should be able to retrieve diffs, fetch article text, etc. and/or have the ability to integrate with an online feature store.

Relevant for the discussion T294414

I am waiting a bit to see if we can share these functionalities (transformer + local proxy), or if every kserve pod will need to have both (I think the latter).

Change 736601 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] [WIP] adding revscoring transformer

https://gerrit.wikimedia.org/r/736601

I added a CR with my WIP revscoring transformer that I started for articlequality a while back. It needs a bit more work, but it seems doable for this class as it is only fetching article text.

For the other model classes (editquality, draftquality, topics), this may be a bit more involved. The transformer will need to install the full revscoring stack (and *spell dicts....) In order to extract the features, we will need to load the model binary ( from /mnt/models) into revscoring to get the list of all required features and then we can call the mw api and extract all required features. See example here: https://github.com/wikimedia/machinelearning-liftwing-inference-services/blob/main/revscoring/editquality/model-server/model.py#L33

In some sense, this could make the actual revscoring model server smaller if we have a larger transformer that handles all feature extraction (w/ the lang dictionaries, nltk stuff, etc.)

The above CR for articlequality is now ready for review. After developing this transformer, I'm starting to think that each model class will require it's own specific transformer, rather than using a shared "generic" image. The reason for this is due to all the additional assets for feature extraction (*spell dictionaries, etc.) a shared transformer image would eventually become quite large (1 GB+) as each model class needs different things. This would defeat the purpose IMO of splitting out the feature extraction and would make it harder to scale additional transformers when necessary. Maybe we can eventually use a shared class or underlying util methods if we see overlap, but not a generic image.

TLDR; let's do specific transformers for each model class since each one will need specific assets and a generic one will eventually grow too large in size.

Change 737803 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] articlequality: add transformer blubberfile

https://gerrit.wikimedia.org/r/737803

Change 736601 merged by Accraze:

[machinelearning/liftwing/inference-services@main] articlequality: adding transformer

https://gerrit.wikimedia.org/r/736601

Change 737803 merged by Accraze:

[machinelearning/liftwing/inference-services@main] articlequality: add transformer blubberfile

https://gerrit.wikimedia.org/r/737803

Change 738444 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] articlequality: add transformer pipeline config

https://gerrit.wikimedia.org/r/738444

Change 738447 had a related patch set uploaded (by Accraze; author: Accraze):

[integration/config@master] inference-services: add articlequality-transformer

https://gerrit.wikimedia.org/r/738447

Change 738447 merged by jenkins-bot:

[integration/config@master] inference-services: add articlequality-transformer

https://gerrit.wikimedia.org/r/738447

Change 738989 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] articlequality: transformer cleanup

https://gerrit.wikimedia.org/r/738989

Change 738444 merged by Accraze:

[machinelearning/liftwing/inference-services@main] articlequality: add transformer pipeline config

https://gerrit.wikimedia.org/r/738444

Change 738989 merged by Accraze:

[machinelearning/liftwing/inference-services@main] articlequality: transformer cleanup

https://gerrit.wikimedia.org/r/738989

We were able to run the first transformer on ml-serve today. See: T294141

Additionally, we also have another transformer ready to deploy for the outlink topic model in T287056

The next step is to create a transformer for each of the other model classes (editquality, draftquality and topic), which we will tackle after the holiday break.

@ACraze @kevinbazira I am currently wondering if the transformer strategy is a good compromise for the revscoring use case. We put a lot of work on it, I understand, but it was all really good experience for future projects (to design models in a better way etc..). We are currently having two very big images (transformer + predictor) for each combination model / wiki, that translate into two pods. Each of them need to pull the model for various activities (the transformer for feature extraction, and the predictor to actually score something), but it seems to me that we may be more efficient in simply doing everything in one model.py (less pods, less connections between pods, less resourced used, etc..).

Let me know your thoughts after working with transformers (I do see the values for the future versions of the revscoring models in lift wing of course), we should decide what to do :)

To recap from discussion at the team technical meeting today:

  • revscoring transformers for editquality/draftquality/topics are very heavy, especially since they need to mount the model binary in the pod to extract features.
    • articlequality does not need to load the model first to extract features, as it accepts raw article text.
  • for editquality, using transformers means 30+ additional pods
  • there is no clear benefit of managing this overhead complexity due to the way revscoring works

Going forward:

  • Revert back to the predictor-only architecture for editquality and then migrate models over to Lift Wing.

@kevinbazira @elukey - what are your thoughts about the other transformers? articlequality is not as heavy due to not needing the model, but the image size is still somewhat large.

I am good to keep testing articlequality with the transformer, it will give us more data and experience on this architecture. I'll let you and Kevin decide, but I am +1 if you both are!

calbon claimed this task.

Change 768784 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] draftquality: remove transformer code

https://gerrit.wikimedia.org/r/768784

Change 768785 had a related patch set uploaded (by Accraze; author: Accraze):

[integration/config@master] inference-services: draftquality cleanup

https://gerrit.wikimedia.org/r/768785

Change 768785 merged by jenkins-bot:

[integration/config@master] inference-services: draftquality cleanup

https://gerrit.wikimedia.org/r/768785

Change 768784 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] draftquality: remove transformer code

https://gerrit.wikimedia.org/r/768784

Change 769121 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] topic: revert to non-transformer architecture

https://gerrit.wikimedia.org/r/769121

Change 769121 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] topic: revert to non-transformer architecture

https://gerrit.wikimedia.org/r/769121