Page MenuHomePhabricator

[Session] Self-hosting ML models on Cloud Services
Closed, ResolvedPublic


  • Title of session: Self-hosting ML models on Cloud Services
  • Session description: We will discuss the current state of (open) AI/ML models, discuss approaches to self-host available models via Cloud Services (PAWS, Toolforge, Cloud VPS; with likely a focus on Cloud VPS), and showcase outputs and limitations.
  • Username for contact: @Isaac
  • Session duration (25 or 50 min): 50 min
  • Session type (presentation, workshop, discussion, etc.): Start with presentation but we'll aim to have models hosted so that much of the session can be set aside for questions and so that interested participants can experiment with the models.
  • Language of session (English, Arabic, etc.): English
  • Prerequisites (some Python, etc.): Python
  • Any other details to share?:
    • Started discussion on T333127 to see if it makes sense to merge these two sessions.
    • Approved Cloud VPS project: T332218
    • Etherpad Link
  • Interested? Add your username below:

Event Timeline

MnLsVt added a subscriber: MnLsVt.

Just collecting some of our thoughts / intentions here for those who are interested:

  • Goal will be to demo a ML-backed tool for doing natural-language search of Wikitech documentation. You can see a simple demo here of the process start-to-finish on PAWS though the goal will be to host it as a webapp so folks can actually use it:
  • We'll share some of our learnings along the way about choosing models, adhering to open-source, challenges with working with some common libraries, etc.
  • Based on what the group of assembled folks is interested in, we can primarily do Q&A or some live coding / experimenting etc.
  • If folks have requests prior to the session, feel free to let us know though no promises that we'll be able to address them.

Session Notes:

Self-hosting ML models on Cloud Services

Date & time: Sunday, May 21st at 11:30 am EEST / 8:30 am UTC

Relevant links


  • Isaac Johnson - machine learning (Research Team)
  • Slavina Stefanova - cloud services

Participants (15)

  • Novem Linguae
  • Husky
  • Martin Gerlach
  • Fuzheado
  • Arturo
  • Virgina Poundstone
  • (your name here??)



  • Learnings
      • challenges with using GPUs: some PyTorch dependencies use NVidia packages which are proprietary.You can use a different --index-url argument when using Python Pip to download the ones that don't include the proprietary NVidia models.
      • look carefully at what models you are using
      • Not all models that declare themselves 'open' are actually open. Usage restrictions might apply. E.g. the Bloom model has this issue even though it was ethiatically trained, used a diverse training set, etc.
    • growing number of openly-licensed models but have some use restrictions (for example BLOOM models). debate around RAIL licenses; Alpaca model is a tricky example.
      • caches
      • large amount of data downloaded (e.g. 20 GB sometimes)
      • unexpected folder, possibly wrong permissions
      • threading - unexpected multi-threading
      • choosing a model
      • lots of different options (beyond open-source) and considerations.
    • objective such as summarization
    • how many languages are supported?
    • size (will it fit into RAM). larger (and better) models often require lots of memory.
    • performance
      • huggingface provides an interface for accessing different models. they provide model cards with more detailed information about the individual models.


  • training a model
    • keep an eye on your ram. you will probably need more during training
    • keep an eye on the # of seconds per example. since you need to do it million of times during training, 0.5 seconds for example might be way too slow
  • is there guidance or policy page on what we can or cannot do with ML on cloud services?
    • licensing - has to be compiled. though ML is sometimes a grey area, e.g. datasets are often not open.
    • only deploying the models on cloud services, not training.
    • suggestion to create a wikipage with a draft of some guidelines
  • will it making sense to host all of the available models on cloud services? (e.g. to avoid duplicating hosting the same model in multiple instances/use cases)
    • (?)
  • large language models are too resource intensive to deploy at Wikimedia. maybe in the range of $50-$100,000 dollars. search models, on the other hand, are much smaller
  • someone suggested that WMF + Hugging Face might be a good partnership
  • could you host models that require a GPU on cloud services?
    • currently no. but there are ongoing discussions whether that is something we should provide to the community
  • do you need GPUs for hosting the model (when doing the inference)? I thought you only need them for training?
    • that used to be true. but new models would take a very long time for inference when not using a GPU.
  • should I fine-tune a model or start training from scratch?
    • would almost always suggest to start with fine-tuning. you can be picky about which model you use for fine-tuning. but starting from scratch takes a really big effort to reach an acceptable performance.
  • data licensing - the data that models are trained on, including the open internet (aka common crawl), is considered fair use in the united states
  • gpu vs cpu - you can have models use either, but GPUs are much faster/better. fundamentally the data is big arrays of floats.
  • in general, models are getting better/smarter not because of improvements in the technology/algorithms, but rather improvements in the hardware and size of resources given to it
  • language models are too complex to train just on wikipedia content.
  • maybe cloud services is not the right place to host these types of models. instead we need a separate and dedicated effort for hosting these models?
  • How do we discover existing ML models in the community?
    • Ideally, should be in ToolHub and tagged with AI or ML
Isaac closed this task as Resolved.EditedMay 23 2023, 7:29 PM

Thanks @NicoleLBee for adding the notes!

Closing this task but the big takeaways for myself:

  • Putting some of these learnings / documentation on Wikitech so they're more accessible. Slide deck in the meantime:
  • Continued conversation in Wiki-AI telegram channel
  • As a community, considering what models-as-a-service it might be useful to offer, though ultimately this is something that should probably happen on LiftWing while Cloud Services remains more of a place for perhaps prototyping ML services or incorporating ML via API calls to other services (when dealing with larger, more complex models).