Run load tests for the article-descriptions isvc
Open, Needs TriagePublic
Actions

Assigned To

Authored By

	kevinbazira
	Dec 22 2023, 2:49 PM

Description

In T343123 we created the article-descriptions model-server and it is currently hosted in the experimental namespace on LiftWing.

We worked on optimizing response time for a single request in T353127.

Now we would like to run load tests and measure the number of multiple parallel requests the article-descriptions isvc can handle effectively.

Details

	Subject	Repo	Branch	Lines +/-
	locust: add article_descriptions load tests	machinelearning/liftwing/inference-services	main	+77 -4
	test: add load test script and input for article-descriptions	machinelearning/liftwing/inference-services	main	+89 -0

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		kevinbazira	T343123 Migrate Machine-generated Article Descriptions from toolforge to liftwing.
		Open		kevinbazira	T353952 Run load tests for the article-descriptions isvc

Event Timeline

kevinbazira created this task.Dec 22 2023, 2:49 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 22 2023, 2:49 PM

I ran load tests using most languages supported by the model with 3 beams set based on T343123#9380779. All the inputs utilized for the request payload can be found in: P54507. Below are the load test results:

requests < 50 in 30s:

kevinbazira@deploy2002:~$ wrk -t 2 -c 6 -d 30s -s article-descriptions.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/article-descriptions:predict -H  "Host: article-descriptions.experimental.wikimedia.org" -H "Content-Type: application/json" --latency -- article-descriptions.input
thread 1 created logfile wrk_1.log created
thread 2 created logfile wrk_2.log created
Running 30s test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/article-descriptions:predict
  2 threads and 6 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   996.93ms  681.67ms   1.89s    56.25%
    Req/Sec     2.67      4.16    20.00     87.18%
  Latency Distribution
     50%    1.29s 
     75%    1.73s 
     90%    1.85s 
     99%    1.89s 
  43 requests in 30.05s, 11.16KB read
  Socket errors: connect 0, read 0, write 0, timeout 27
  Non-2xx or 3xx responses: 40
Requests/sec:      1.43
Transfer/sec:     380.22B
thread 1 made 28 requests and got 24 responses
thread 2 made 22 requests and got 19 responses

requests > 50 in 30s:

kevinbazira@deploy2002:~$ wrk -t 8 -c 24 -d 30s -s article-descriptions.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/article-descriptions:predict -H  "Host: article-descriptions.experimental.wikimedia.org" -H "Content-Type: application/json" --latency -- article-descriptions.input
thread 1 created logfile wrk_1.log created
thread 2 created logfile wrk_2.log created
thread 3 created logfile wrk_3.log created
thread 4 created logfile wrk_4.log created
thread 5 created logfile wrk_5.log created
thread 6 created logfile wrk_6.log created
thread 7 created logfile wrk_7.log created
thread 8 created logfile wrk_8.log created
Running 30s test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/article-descriptions:predict
  8 threads and 24 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    26.92ms    7.43ms  34.51ms   66.67%
    Req/Sec     1.68      2.83    10.00     84.62%
  Latency Distribution
     50%   28.40ms
     75%   32.67ms
     90%   34.51ms
     99%   34.51ms
  82 requests in 30.05s, 20.01KB read
  Socket errors: connect 0, read 0, write 0, timeout 76
  Non-2xx or 3xx responses: 77
Requests/sec:      2.73
Transfer/sec:     682.07B
thread 1 made 13 requests and got 9 responses
thread 2 made 13 requests and got 10 responses
thread 3 made 13 requests and got 10 responses
thread 4 made 13 requests and got 10 responses
thread 5 made 14 requests and got 11 responses
thread 6 made 13 requests and got 10 responses
thread 7 made 14 requests and got 11 responses
thread 8 made 14 requests and got 11 responses

Based on the above reports, the isvc that is currently running on 1 pod in the experimental namespace can handle a maximum of 20 requests per second if the total number of requests made within a 30-second time frame is less than 50. However, if the total number of requests exceeds 50 within the same duration, the server's throughput drops down to a maximum of around 10 requests per second.

We shall compare these numbers with the anticipated load that the Android team will share in response to T343123#9420718.

Change 985127 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] test: add load test script and input for article-descriptions

https://gerrit.wikimedia.org/r/985127

gerritbot added a project: Patch-For-Review.Dec 22 2023, 3:59 PM

Change 985127 merged by Kevin Bazira:

[machinelearning/liftwing/inference-services@main] test: add load test script and input for article-descriptions

https://gerrit.wikimedia.org/r/985127

kevinbazira mentioned this in rMLISf14d8faaa4b9: test: add load test script and input for article-descriptions.Dec 22 2023, 5:24 PM

Maintenance_bot removed a project: Patch-For-Review.Dec 22 2023, 5:30 PM

kevinbazira mentioned this in T343123: Migrate Machine-generated Article Descriptions from toolforge to liftwing..Jan 8 2024, 10:45 AM

calbon moved this task from Unsorted to Ready To Go on the Machine-Learning-Team board.Jan 9 2024, 3:42 PM

As discussed we need to rerun the above tests as a lot of the requests done have failed so the statistics are not really useful at the moment (in the first one it seems that 27 out of 41 got a timeout and in the second one 76 out of 82).
I suggest we leave the wrk/lua tests and write a test that we can use with locust.

Change 995039 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[machinelearning/liftwing/inference-services@main] locust: add article_descriptions load tests

https://gerrit.wikimedia.org/r/995039

gerritbot added a project: Patch-For-Review.Feb 7 2024, 11:32 AM

Change 995039 merged by Ilias Sarantopoulos:

[machinelearning/liftwing/inference-services@main] locust: add article_descriptions load tests

https://gerrit.wikimedia.org/r/995039

isarantopoulos mentioned this in rMLIS3ac7aed3b26c: locust: add article_descriptions load tests.Feb 7 2024, 4:19 PM

Maintenance_bot removed a project: Patch-For-Review.Feb 7 2024, 4:31 PM

Run load tests for the article-descriptions isvcOpen, Needs TriagePublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Run load tests for the article-descriptions isvc
Open, Needs TriagePublic
Actions

Related Objects
Search...