Investigate way of comparing load test results
Open, MediumPublic2 Estimated Story Points
Actions

Assigned To

Authored By

	isarantopoulos
	Jan 19 2024, 9:15 AM

Description

As an engineer I want to have a way of comparing load test results so that I know if a new code change is making a server slower/faster.
As part of this task I want to have a POC of a way to do such a comparison which could then be applied to all model servers.

An example could be (as written in the parent task):

Load the csv in a pandas dataframe
Run the new load tests and join the new results with the old ones
Calculate the differences in latencies (or even better run a t-test) and report results.

Details

	Subject	Repo	Branch	Lines +/-
	locust: save separate results file per model	machinelearning/liftwing/inference-services	main	+83 -40
	locust: first example	machinelearning/liftwing/inference-services	main	+344 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	kevinbazira	T348156 Goal: Increase the number of models hosted on Lift Wing
Open	None	T348850 Establish a standard load testing procedure
Open	isarantopoulos	T355394 Investigate way of comparing load test results

Event Timeline

isarantopoulos created this task.Jan 19 2024, 9:15 AM

isarantopoulos claimed this task.Jan 19 2024, 9:18 AM

isarantopoulos triaged this task as Medium priority.

isarantopoulos set the point value for this task to 2.

isarantopoulos moved this task from Unsorted to In Progress on the Machine-Learning-Team board.

Change 989732 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[machinelearning/liftwing/inference-services@main] locust: first example

https://gerrit.wikimedia.org/r/989732

gerritbot added a project: Patch-For-Review.Jan 19 2024, 2:33 PM

The first example has the following functionality:

Run either a single load test or the whole test suite for all model servers
There is the ability to select a specific model or set of models by specifying an environment variable on run e.g. MODEL=revertrisk locust
Results are compared against a previous run in the following way:
- the first time we run the tests we add the --csv results cmd argument and results are saved in a file named results_stats.csv
- in each subsequent run we omit the csv argument and the new results are joined with the old ones.
- we end up with a table that holds the % change of Average and Media latency as well as % of requests per second.
- we print the model servers (along with the stats) for which the percentage change is above a predefined thresholds (eg. 15-20%) compared to the initial results. This most likely means that this model server has either become slower or there is an issue with the inputs that we provide in the load tests and should update them (for example we may be getting too many 400 responses)

After a discussion with @kevinbazira we'll add the functionality to break down csv files per model server in order for us to be able to update the results of a model server in isolation to the rest of them.

Change 989732 merged by Ilias Sarantopoulos:

[machinelearning/liftwing/inference-services@main] locust: first example

https://gerrit.wikimedia.org/r/989732

isarantopoulos mentioned this in rMLIS27a07e52bdec: locust: first example.Jan 26 2024, 10:28 AM

Maintenance_bot removed a project: Patch-For-Review.Jan 26 2024, 10:30 AM

Change 993078 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[machinelearning/liftwing/inference-services@main] WIP - locust: save separate results file per model

https://gerrit.wikimedia.org/r/993078

gerritbot added a project: Patch-For-Review.Jan 26 2024, 11:38 AM

Change 993078 merged by Ilias Sarantopoulos:

[machinelearning/liftwing/inference-services@main] locust: save separate results file per model

https://gerrit.wikimedia.org/r/993078

isarantopoulos mentioned this in rMLISd2629832f1b8: locust: save separate results file per model.Feb 1 2024, 7:15 AM

Investigate way of comparing load test resultsOpen, MediumPublic2 Estimated Story PointsActions

Description

Details

Related ObjectsSearch...

Event Timeline

Investigate way of comparing load test results
Open, MediumPublic2 Estimated Story Points
Actions

Related Objects
Search...