Establish a standard load testing procedure
Open, MediumPublic
Actions

Assigned To

None

Authored By

	isarantopoulos
	Oct 13 2023, 11:21 AM

Description

As an engineer,
I would like to setup a streamlined procedure for load testing existing model servers in order to easily test new changes before deployment and avoid regressions.
Manually running load testing after each change for all servers is almost impossible, and this is something that we would need for regular sw upgrades in order to avoid regressions, e.g. increased latency after a kserve/numpy/pytorch upgrade.
The process could consist of a script that makes a standard set of tests and outputs a table and a plot/chart.
Later we can built on it incrementally if we want to automate.

Related Objects
Search...

Status	Assigned	Task
Open	kevinbazira	T348156 Goal: Increase the number of models hosted on Lift Wing
Open	None	T348850 Establish a standard load testing procedure
Open	None	T351939 Document load test results
Open	isarantopoulos	T355394 Investigate way of comparing load test results

Event Timeline

isarantopoulos created this task.Oct 13 2023, 11:21 AM

isarantopoulos removed a project: Goal.Oct 13 2023, 1:44 PM

My initial motivation came from having more interpretable results when running comparison that just pasting results like I did in this articlequality load test results

elukey moved this task from Unsorted to Backlog/Lift Wing on the Machine-Learning-Team board.Oct 17 2023, 2:29 PM

Locust seems like a nice tool for this kind of work (Leaving this here for future reference)

Change 989732 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[machinelearning/liftwing/inference-services@main] locust: first example

https://gerrit.wikimedia.org/r/989732

gerritbot added a project: Patch-For-Review.Jan 11 2024, 10:19 AM

Indeed locust can support our needs. We can define all load tests in one file and get one final report like in the image below where I have just 2 models, but image that we could get a summary report for all LiftWing models or just the ones we want to run.
Just leaving these as examples for when we start to tackle this.

Screenshot 2024-01-11 at 12.20.58 PM.png (1×2 px, 236 KB)

We could run the load tests once and export the results in a csv according to the documentation.
Using the saved data then we can create a procedure that runs a new load test and automatically checks if there are significant increases in latency.
An example could be:

Load the csv in a pandas dataframe
Run the new load tests and join the new results with the old ones
Calculate the differences in latencies (or even better run a t-test) and report results.

We can start tackling this step by step first by creating load tests with locust, run them and commit the csv with the results in our repo.

isarantopoulos mentioned this in T351939: Document load test results.Jan 17 2024, 10:14 AM

kevinbazira added a subtask: T351939: Document load test results.Jan 17 2024, 1:21 PM

isarantopoulos triaged this task as Medium priority.Jan 19 2024, 9:19 AM

isarantopoulos moved this task from Backlog/Lift Wing to Ready To Go on the Machine-Learning-Team board.