Page MenuHomePhabricator

Investigate way of comparing load test results
Open, MediumPublic2 Estimated Story Points

Description

As an engineer I want to have a way of comparing load test results so that I know if a new code change is making a server slower/faster.
As part of this task I want to have a POC of a way to do such a comparison which could then be applied to all model servers.

An example could be (as written in the parent task):

  • Load the csv in a pandas dataframe
  • Run the new load tests and join the new results with the old ones
  • Calculate the differences in latencies (or even better run a t-test) and report results.

Event Timeline

isarantopoulos triaged this task as Medium priority.
isarantopoulos set the point value for this task to 2.
isarantopoulos moved this task from Unsorted to In Progress on the Machine-Learning-Team board.

Change 989732 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[machinelearning/liftwing/inference-services@main] locust: first example

https://gerrit.wikimedia.org/r/989732

The first example has the following functionality:

  • Run either a single load test or the whole test suite for all model servers
  • There is the ability to select a specific model or set of models by specifying an environment variable on run e.g. MODEL=revertrisk locust
  • Results are compared against a previous run in the following way:
    • the first time we run the tests we add the --csv results cmd argument and results are saved in a file named results_stats.csv
    • in each subsequent run we omit the csv argument and the new results are joined with the old ones.
    • we end up with a table that holds the % change of Average and Media latency as well as % of requests per second.
    • we print the model servers (along with the stats) for which the percentage change is above a predefined thresholds (eg. 15-20%) compared to the initial results. This most likely means that this model server has either become slower or there is an issue with the inputs that we provide in the load tests and should update them (for example we may be getting too many 400 responses)

After a discussion with @kevinbazira we'll add the functionality to break down csv files per model server in order for us to be able to update the results of a model server in isolation to the rest of them.

Change 989732 merged by Ilias Sarantopoulos:

[machinelearning/liftwing/inference-services@main] locust: first example

https://gerrit.wikimedia.org/r/989732

Change 993078 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[machinelearning/liftwing/inference-services@main] WIP - locust: save separate results file per model

https://gerrit.wikimedia.org/r/993078

Change 993078 merged by Ilias Sarantopoulos:

[machinelearning/liftwing/inference-services@main] locust: save separate results file per model

https://gerrit.wikimedia.org/r/993078