Debug GPU deployments on ml-staging
Open, Needs TriagePublic
Actions

Assigned To

Authored By

	isarantopoulos
	Jan 29 2024, 11:33 AM

Description

As an engineer,

I want to establish a way to work with the GPU on the experimental staging to enable ease of experimentation without going through our CI/CD pipelines. Now that we have permissions to edit/delete/attach to pods on ml-staging this is possible.

Examples of such work include:

Attach to a pod and change some code and then measure if it improved our response times
Change resources (memory/cpu) on the pod

Event Timeline

isarantopoulos created this task.Jan 29 2024, 11:33 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 29 2024, 11:33 AM

isarantopoulos added a project: Lift-Wing.Jan 29 2024, 11:33 AM

For the moment I came up with the following 2 options to implement the above functionality:

Enable hot reload for the inference service: since kserve is based on FastAPI we could enable the reload option ( the same thing we do when providing --reload cmd option when we run a FastAPI app). This would allow us to change the code and the changes would automatically be reflected in the running process. Unfortunately there doesn't seem to be an easy way to enable this in kserve without changing the kserve code. In order to proceed with this we need to submit a feature request and perhaps contribute to upstream with this feature. It shouldn't be much work but it doesn't seem like a viable solution at the moment cause even if it was implemented today we would have to wait for the new release.
Kill the running process, change the code and rerun it: Although this seems like a nice idea it is going against the fundamental way of which containers run. The entrypoint (the kserve app) is always ran as PID 1, meaning this is the main process that other processes would fork and when this exits (killed in our case) the lifecycle of the container would be considered to be complete. As a result Kuberetes follows its restart policy which uses the defined docker image so the new code changes are of course wiped.

After doing some research and asking around, I found that we could use dumb-init which seems exactly what we want. It uses a dummy init as PID 1 and we can then run our service after this, which would result the container to stay alive after we kill the running python process. I plan to explore its use for staging/development purposes and we can afterwards further explore if it can be used in production using a process supervisor(dumb-init is actually a proper supervisor that can be used in production). In any case our focus now is just for debugging/development purposes.

Using dumb-init with blubber seems to be a challenge as I can't find the equivalent of a docker CMD command. We'd like to do the following:

ENTRYPOINT ["/usr/bin/dumb-init", "--"]
CMD ["python" "model.py"]

where model.py is the main file for our model server.

I have had no luck using dumb-init a the moment.

Instead I opened an issue on kserve to add the ability to hot reload the model server when new changes are made
https://github.com/kserve/kserve/issues/3420

calbon moved this task from Unsorted to Ready To Go on the Machine-Learning-Team board.Feb 13 2024, 3:14 PM

Debug GPU deployments on ml-stagingOpen, Needs TriagePublicActions

Description

Event Timeline

Debug GPU deployments on ml-staging
Open, Needs TriagePublic
Actions