Page MenuHomePhabricator

2024 Q4: Users can "pip install liftwing" and access 20% of models
Open, Needs TriagePublic

Description

This project involves building a python package that can act as a model registry for the machine learning models deployed on Lift Wing.
A model registry acts as the source of truth for the deployed models and their versions offering two main benefits:

  • Versioning and tracking of models: this allows an easier access to model version and tracking
  • Collaboration and reproducibility: in order to download a model the user only needs to interact with the registry.

Implementation Proposal
A python package that allows that has different install options according to the model as each model server has different package requirements. The user, after installing the package, will be able to load a Lift Wing model and make predictions.
Taking into consideration the short duration of the internship as well as the fact that we want the person to get to know the Wikimedia community, our way of working as well as get the chance to study and dive into technical topics, the package will first deal with 1-2 models in order to create a complete proof of concept for this work. Also, to avoid blocking this work by other systems/factors or permissions it will be based on our public interfaces:
The python package will have a repository on GitHub with CI/CD setup using Github Actions that will automatically upload the python package to the PyPI repository.
Models for the packages will be fetched by the public analytics repository https://analytics.wikimedia.org/published/wmf-ml-models/

There will be two modes of operation for each model:

  • Offline: the user can download and load the model and start making predictions with it. This is particularly useful for experimentation or in the case when someone wants to make a big number of batch requests that would otherwise fail due to rate limiting.
  • Online: The user can make requests to the public APIs (Lift Wing API Gateway) using the package as a client.

Notes/Considerations:

  • We would have to figure out a (nice) way to integrate this with the deployment charts repo in order to get the model version we need to deploy.
  • Model’s python dependencies: Each model has been developed separately and may require different python libraries and versions. This means that the python package should have different installation options which will reflect the dependencies of a specific model.

Event Timeline

calbon renamed this task from Lift Wing Python Package to Q4: Lift Wing Python Package.Mar 5 2024, 3:19 PM
calbon assigned this task to Mercelisvaughan.
calbon added a project: Goal.
calbon moved this task from Unsorted to Current Quarter Goals on the Machine-Learning-Team board.
calbon renamed this task from Q4: Lift Wing Python Package to 2024 Q4: Lift Wing Python Package.Apr 16 2024, 2:57 PM
calbon renamed this task from 2024 Q4: Lift Wing Python Package to 2024 Q4: Users can "pip install liftwing" and access 20% of models.

We have a first MVP for the package.

  • The package is now available in test pypi and can be installed like this:
pip install -i https://test.pypi.org/simple/ liftwing
  • revertrisk has been added and can be accessed via a python script:
from liftwing.models import RevertRiskAPIModel

client = RevertRiskAPIModel()
result = client.request(payload={"lang": "en", "rev_id": "123456"})

print(result)

Result:

{
   "model_name":"revertrisk-language-agnostic",
   "model_version":"3",
   "wiki_db":"enwiki",
   "revision_id":"123456",
   "output":{
      "prediction":false,
      "probabilities":{
         "true":0.25512129068374634,
         "false":0.7448787093162537
      }
   }
}

People can now pip install and use models. Right now we only have a few models - the number of models should increase over time.

We have added request payload validation with pydantic and currently adding more models to the package.

We made request validation optional and it is now really simple to add support for a new model to the package.
Have also added metadata (optional again) for each model. The user can get the list of available models by running python -m liftwing - relevant PR

We are focusing on adding all the models available through the API Gateway. The package can work also with the internal endpoints but first we will include the models that are publicly available.

All revscoring models have been added in the attached Pull Request and will be included in v0.1 of the package.

liftwing package version 0.1.0 has been released on PyPI - https://pypi.org/project/liftwing/

Just released version 0.1.0 of the liftwing python package to PyPI. You can try it out:

pip install liftwing

to view the available models:

python -m liftwing

and then use it in a script like this:

from liftwing import RevertRiskAPIModel
client = RevertRiskAPIModel()
print(client.request(payload={"rev_id": 123456, "lang": "en"}))

At the moment it contains the following models:

remaining models to add:

  • revertrisk multilingual
  • language identification
  • readability
  • outlink articletopic