[Session] Self-hosting ML models on Cloud Services
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Isaac
	Apr 3 2023, 4:19 PM

Description

Title of session: Self-hosting ML models on Cloud Services
Session description: We will discuss the current state of (open) AI/ML models, discuss approaches to self-host available models via Cloud Services (PAWS, Toolforge, Cloud VPS; with likely a focus on Cloud VPS), and showcase outputs and limitations.
Username for contact: @Isaac
Session duration (25 or 50 min): 50 min
Session type (presentation, workshop, discussion, etc.): Start with presentation but we'll aim to have models hosted so that much of the session can be set aside for questions and so that interested participants can experiment with the models.
Language of session (English, Arabic, etc.): English
Prerequisites (some Python, etc.): Python
Any other details to share?:
- Started discussion on T333127 to see if it makes sense to merge these two sessions.
- Approved Cloud VPS project: T332218
- Etherpad Link
Interested? Add your username below:
- @MGerlach
- @Slst2020
- @DimitriosRingas
- @MnLsVt
- @LabDom
- @kostajh
- @TBurmeister
- @Htriedman
- @roti_WMDE

Related Objects

Mentioned In: T331275: [Session] Cool new things in PHP
T333127: [Session] LLMs, ChatGPT, machine learning tools, etc
Mentioned Here: T332218: Request creation of hackathon-2023-ml VPS project
T333127: [Session] LLMs, ChatGPT, machine learning tools, etc

Event Timeline

Isaac created this task.Apr 3 2023, 4:19 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 3 2023, 4:19 PM

srishakatux moved this task from Backlog to Proposed sessions on the Wikimedia-Hackathon-2023 board.Apr 4 2023, 5:41 AM

MGerlach updated the task description. (Show Details)Apr 4 2023, 8:42 AM

MGerlach subscribed.

Slst2020 updated the task description. (Show Details)Apr 5 2023, 8:51 AM

DimitriosRingas updated the task description. (Show Details)Apr 5 2023, 8:44 PM

DimitriosRingas subscribed.

MnLsVt updated the task description. (Show Details)Apr 9 2023, 6:35 PM

MnLsVt subscribed.

srishakatux moved this task from Proposed sessions to Accepted sessions on the Wikimedia-Hackathon-2023 board.Apr 17 2023, 7:06 PM

LabDom updated the task description. (Show Details)Apr 18 2023, 10:59 AM

LabDom subscribed.

kostajh mentioned this in T333127: [Session] LLMs, ChatGPT, machine learning tools, etc.Apr 26 2023, 7:03 PM

Michael subscribed.Apr 27 2023, 8:39 AM

Below you may find the link attached that redirects the user to the corresponding Etherpad: https://etherpad.wikimedia.org/p/wmh2023-Self-hosting_ML_models_on_Cloud_Services

SpyridonKokotos updated the task description. (Show Details)May 4 2023, 8:51 PM

kostajh updated the task description. (Show Details)May 5 2023, 3:41 PM

kostajh subscribed.

Just collecting some of our thoughts / intentions here for those who are interested:

Goal will be to demo a ML-backed tool for doing natural-language search of Wikitech documentation. You can see a simple demo here of the process start-to-finish on PAWS though the goal will be to host it as a webapp so folks can actually use it: https://public-paws.wmcloud.org/User:Isaac_(WMF)/hackathon-2023/wikitech-natural-language-search.ipynb
We'll share some of our learnings along the way about choosing models, adhering to open-source, challenges with working with some common libraries, etc.
Based on what the group of assembled folks is interested in, we can primarily do Q&A or some live coding / experimenting etc.
If folks have requests prior to the session, feel free to let us know though no promises that we'll be able to address them.

TBurmeister updated the task description. (Show Details)May 10 2023, 4:22 PM

TBurmeister subscribed.

Htriedman updated the task description. (Show Details)May 10 2023, 5:47 PM

roti_WMDE updated the task description. (Show Details)May 16 2023, 3:32 PM

roti_WMDE subscribed.

Lucas_Werkmeister_WMDE mentioned this in T331275: [Session] Cool new things in PHP.May 19 2023, 7:49 AM

Alexey_Skripnik subscribed.May 19 2023, 2:53 PM

Fuzheado subscribed.May 21 2023, 8:22 AM

Session Notes:

Self-hosting ML models on Cloud Services

Date & time: Sunday, May 21st at 11:30 am EEST / 8:30 am UTC

Relevant links

Phabricator task: https://phabricator.wikimedia.org/T333853

Presenters

Isaac Johnson - machine learning (Research Team)
Slavina Stefanova - cloud services

Participants (15)

Novem Linguae
Husky
Martin Gerlach
Fuzheado
Arturo
Virgina Poundstone
(your name here??)

Notes

Presentation

There's multiple types of AI models, such as machine learning, large language models, etc. This session will focus on machine learning.
Motivating example
- wikitech search example: "How do I connect to my instance?" current search not very good at finding most relevant pages with solution to the specific answer. (see https://public-paws.wmcloud.org/User:Isaac_(WMF)/hackathon-2023/wikitech-natural-language-search.ipynb and https://search-wikitech.wmcloud.org/docs)
  - the solution: machine learning API/tool
  - https://search-wikitech.wmcloud.org/docs
  - models used in this tool:
  - deepset
  - (one other)
  - result is given as JSON with a title, score, text
  - wikitech -> embedding model -> embedding search index -> most relevant passages
  - tech
  - NLP framework (Python Haystack)
  - ML: Transformers (PyTorch)
  - Database (FAISS)
  - API: FastAPI

Learnings
- - challenges with using GPUs: some PyTorch dependencies use NVidia packages which are proprietary.You can use a different --index-url argument when using Python Pip to download the ones that don't include the proprietary NVidia models.
  - look carefully at what models you are using
  - Not all models that declare themselves 'open' are actually open. Usage restrictions might apply. E.g. the Bloom model has this issue even though it was ethiatically trained, used a diverse training set, etc.
- growing number of openly-licensed models but have some use restrictions (for example BLOOM models). debate around RAIL licenses; Alpaca model is a tricky example.
  - caches
  - large amount of data downloaded (e.g. 20 GB sometimes)
  - unexpected folder, possibly wrong permissions
  - threading - unexpected multi-threading
  - choosing a model
  - lots of different options (beyond open-source) and considerations.
- objective such as summarization
- how many languages are supported?
- size (will it fit into RAM). larger (and better) models often require lots of memory.
- performance
  - huggingface provides an interface for accessing different models. they provide model cards with more detailed information about the individual models.

Discussion

training a model
- keep an eye on your ram. you will probably need more during training
- keep an eye on the # of seconds per example. since you need to do it million of times during training, 0.5 seconds for example might be way too slow

is there guidance or policy page on what we can or cannot do with ML on cloud services?
- licensing - has to be compiled. though ML is sometimes a grey area, e.g. datasets are often not open.
- only deploying the models on cloud services, not training.
- suggestion to create a wikipage with a draft of some guidelines

will it making sense to host all of the available models on cloud services? (e.g. to avoid duplicating hosting the same model in multiple instances/use cases)
- (?)

large language models are too resource intensive to deploy at Wikimedia. maybe in the range of $50-$100,000 dollars. search models, on the other hand, are much smaller
someone suggested that WMF + Hugging Face might be a good partnership
could you host models that require a GPU on cloud services?
- currently no. but there are ongoing discussions whether that is something we should provide to the community

do you need GPUs for hosting the model (when doing the inference)? I thought you only need them for training?
- that used to be true. but new models would take a very long time for inference when not using a GPU.

should I fine-tune a model or start training from scratch?
- would almost always suggest to start with fine-tuning. you can be picky about which model you use for fine-tuning. but starting from scratch takes a really big effort to reach an acceptable performance.

data licensing - the data that models are trained on, including the open internet (aka common crawl), is considered fair use in the united states
gpu vs cpu - you can have models use either, but GPUs are much faster/better. fundamentally the data is big arrays of floats.
in general, models are getting better/smarter not because of improvements in the technology/algorithms, but rather improvements in the hardware and size of resources given to it
language models are too complex to train just on wikipedia content.
maybe cloud services is not the right place to host these types of models. instead we need a separate and dedicated effort for hosting these models?
- maybe. for example, LiftWing provides infrastructure to host ML models https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing but that needs to be coordinated with the ML Team at WMF.

How do we discover existing ML models in the community?
- Ideally, should be in ToolHub and tagged with AI or ML

Thanks @NicoleLBee for adding the notes!

Closing this task but the big takeaways for myself:

Putting some of these learnings / documentation on Wikitech so they're more accessible. Slide deck in the meantime: https://docs.google.com/presentation/d/1um31nHhXcH8Xssk8QbNnNkeg9R5nFj3Wub5xtDAAcIA/edit?usp=sharing
Continued conversation in Wiki-AI telegram channel
As a community, considering what models-as-a-service it might be useful to offer, though ultimately this is something that should probably happen on LiftWing while Cloud Services remains more of a place for perhaps prototyping ML services or incorporating ML via API calls to other services (when dealing with larger, more complex models).

[Session] Self-hosting ML models on Cloud ServicesClosed, ResolvedPublicActions