Page MenuHomePhabricator

[LLM] Lift Wing load testing
Open, MediumPublic

Description

We want to establish a load testing process for LLMs deployed on Lift Wing ml-staging. This should include:

  • Standard load testing to measure request latency percentiles, consistent with our other models
  • Visualization of latency patterns on varying input/output token size, similar to our ML-lab benchmark (T382343)

Before implementation, we should:

  • Investigate what other folks are doing when load testing LLMs
  • Explore how these ways can be adapted to our infrastructure (locust through statboxes etc).

Event Timeline

isarantopoulos moved this task from Unsorted to Ready To Go on the Machine-Learning-Team board.
isarantopoulos updated Other Assignee, added: kevinbazira.

Now that we have the ml-labs available, we can test the performance on ml-labs first. The work related to ml-labs highly overlaps with T377496 and we should use the same benchmarks for coherence. More information on the benchmarks can be found in the repo https://gitlab.wikimedia.org/repos/research/llm_evaluation/-/blob/mnz/llmperf/llmperf/results/huggingface.ipynb?ref_type=heads

achou renamed this task from Load test LLMs to [LLM] Lift Wing load testing.Dec 17 2024, 2:47 PM
achou updated the task description. (Show Details)
isarantopoulos updated Other Assignee, removed: kevinbazira.
isarantopoulos added a subscriber: achou.