Page MenuHomePhabricator

Establish a standard load testing procedure
Open, MediumPublic

Description

As an engineer,
I would like to setup a streamlined procedure for load testing existing model servers in order to easily test new changes before deployment and avoid regressions.
Manually running load testing after each change for all servers is almost impossible, and this is something that we would need for regular sw upgrades in order to avoid regressions, e.g. increased latency after a kserve/numpy/pytorch upgrade.
The process could consist of a script that makes a standard set of tests and outputs a table and a plot/chart.
Later we can built on it incrementally if we want to automate.

Event Timeline

My initial motivation came from having more interpretable results when running comparison that just pasting results like I did in this articlequality load test results

Locust seems like a nice tool for this kind of work (Leaving this here for future reference)

Change 989732 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[machinelearning/liftwing/inference-services@main] locust: first example

https://gerrit.wikimedia.org/r/989732

Indeed locust can support our needs. We can define all load tests in one file and get one final report like in the image below where I have just 2 models, but image that we could get a summary report for all LiftWing models or just the ones we want to run.
Just leaving these as examples for when we start to tackle this.

Screenshot 2024-01-11 at 12.20.48 PM.png (694×3 px, 232 KB)

Screenshot 2024-01-11 at 12.20.58 PM.png (1×2 px, 236 KB)

We could run the load tests once and export the results in a csv according to the documentation.
Using the saved data then we can create a procedure that runs a new load test and automatically checks if there are significant increases in latency.
An example could be:

  • Load the csv in a pandas dataframe
  • Run the new load tests and join the new results with the old ones
  • Calculate the differences in latencies (or even better run a t-test) and report results.

We can start tackling this step by step first by creating load tests with locust, run them and commit the csv with the results in our repo.