Investigate ModelMesh architecture
Open, Needs TriagePublic
Actions

Assigned To

Authored By

	isarantopoulos
	Feb 23 2023, 3:00 PM

Description

We want to investigate how a model mesh could be implemented on our infrastructure to do multi-model serving.
In the current way that kserve is deployed in LiftWing each model has its own pod, thus scaling although possible increases the needs for resources (CPU, memory) linearly with the number of model servers.
The key benefits of such an approach for us would be:

multiple models per pods => less pods => less memory needed
it can potentially make it easier to use GPU for inference. The scenario we want to explore is putting the models that require a GPU in the same pod. This depends on the size of the model (if it fits in GPU memory) and how much time it takes to load a model of that size in accordance to the latency requirements that we may have.

The main purpose of the task is for the team to get acquainted with the architecture and decide if this would be a possible future implementation, rather than implement is right away as the feature is still in alpha version.

Note: In the documentation it is referenced that at this point only gRPC calls are supported for this version of ModelMesh so we should also follow up on REST support

Event Timeline

isarantopoulos created this task.Feb 23 2023, 3:00 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 23 2023, 3:00 PM

isarantopoulos claimed this task.Feb 23 2023, 3:02 PM

isarantopoulos updated the task description. (Show Details)

calbon moved this task from Unsorted to Backlog/Lift Wing on the Machine-Learning-Team board.Mar 7 2023, 3:47 PM

achou subscribed.Jan 25 2024, 8:50 AM

Investigate ModelMesh architectureOpen, Needs TriagePublicActions

Description

Event Timeline

Investigate ModelMesh architecture
Open, Needs TriagePublic
Actions