Event Timeline
Comment Actions
Infra
- Setting up the puppet roles
- Can't commit puppet roles until the machines are there
- Reached out to vendor
Software side
- Spike with flashattention(2?)
- Hopefully results by this week
Comment Actions
- Continuing work on vllm as an inference optimization engine. There have been updates in kserve that allow to use versions later than 0.4.2 which will solve the version discrepancies between kserve/torch/vllm.
- GPU hosts are up and running on production!
Comment Actions
- Slow progress on vllm. @isarantopoulos will discuss the issue with @klausman and @achou during the week.