Page MenuHomePhabricator

Request creation of machine-learning VPS project
Closed, ResolvedPublic

Description

Project Name: machine-learning

Wikitech Usernames of requestors: Elukey,Klausman,Accraze,Kevin Bazira,Chris Albon

Purpose: The ML team would need a place where to build docker images and experiment with models before reaching the staging cluster in production. Running minikube would also be a plus.

Brief description: The ML team needs to be able to test the KServe stack, running the following:

  1. Minikube with a basic version of the stack (KServe, Knative, Istio).
  2. Docker to experiment with image building when testing new models, new code provided by upstream, etc..

We cannot do this in Analytics-land in production due to security concerns (one above all, running docker), but we were wondering if Cloud VPS would be a good place. We wouldn't use the VPS instances to pull random docker images from the Internet, but we'd like to build our own and test them on Minikube when experimenting. The alternative would be to use the k8s staging cluster in production that is not really viable for multiple reasons (security, flexibility, etc..).
Ideally we'd need few beefy VMs (probably 16/32G of RAM and few cores), we could start with one or two if there are capacity concerns (in theory we wouldn't need more than those in the future, a couple are more than enough).

How soon you are hoping this can be fulfilled: As soon as possible :) We are not blocked by this task, but it would be nice to have a place to experiment in this quarter.

Thanks in advance!

Event Timeline

This project would be different from the ores* ones, that we hope to delete as soon as we have decommissioned ORES in production (will take some months). We need to keep the ORES project as there are some testing steps before deploying in production that need them.

Hi @elukey, can you determine how many beefy VMs will you need? (if you can put an exact number on the RAM/vCPU needed the better, otherwise we might choose something that does not work for you xd).

Thanks!

@dcaro 2 VMs, 32G of RAM each + 8 vcpus each would be a good start, let me know if it is too much or not :)

This project is now created, with rights given to Elukey,Klausman,Accraze. Please feel free to add the others. Initial limits are 32 cores, 128G RAM.

Mentioned in SAL (#wikimedia-cloud) [2021-11-09T16:46:15Z] <balloons> Created project with 32 cores, 128GB RAM T294964