Page MenuHomePhabricator

Experiment with GPUs in the Machine Learning infrastructure
Open, Needs TriagePublic

Description

In this epic we should work on GPUs in the context of the ML infrastructure.

These are the main subjects to work on:

  • Update AMD drivers to the latest release, and verify with Research that they work as expected.
  • Import the AMD k8s device plugin in our repos/configs (will need a follow up with ServiceOps for the security bits).
  • Decide what GPU should be deployed on Lift Wing, and order some of them.

Event Timeline

elukey renamed this task from Add GPUs to the Machine Learning infrastructure to Experiment with GPUs in the Machine Learning infrastructure.May 16 2023, 2:32 PM

We have successfully deployed bloom-560m with and without GPU on LiftWing 🎉
Preliminary results show an out-of-the-box (without additional inference optimization) of 7x-10x reduced latencies. For example for a requested result length of 200 tokens the cpu version takes ~45seconds while th gpu one 4s.
A more detailed comparison will follow.

bloom-3b works as well! Tried to generate 100 tokens: with GPU ~7s, without 1:30 mins :D