Page MenuHomePhabricator

Gather performance info for the Related Articles type in Recommendation API
Closed, DeclinedPublic

Description

As the Recommendations service exists now, what are the following values?

  • Throughput
  • Request volume
  • Resource usage per request
    • Memory
    • CPU

Results summary

for details, see comments

  • Throughput: 1000 requests / 1882.4 seconds
  • Request volume: 9955 requests / month
  • Resource usage:
    • Memory: 20KB during request, 2.5GB for the embedding
    • CPU: ~.3 seconds for item query, excluding locking and io

Event Timeline

@schana thanks for creating this… I was about to do this today - sorry for the delay

I'm going to dupe this ticket (1 for Translation and 1 for Recommendations) so we can see each separately

I'm going to dupe this ticket (1 for Translation and 1 for Recommendations) so we can see each separately

To clarify, you mean the Translation type and Related Articles type (as defined in https://meta.wikimedia.org/wiki/Recommendation_API)?

Fjalapeno renamed this task from Gather performance info for Recommendation API to Gather performance info for the Related Articles type in Recommendation API.Mar 3 2017, 8:36 PM

Change 341341 had a related patch set uploaded (by nschaaf):
[research/recommendation-api] Add script to profile resource usage

https://gerrit.wikimedia.org/r/341341

Change 341341 merged by jenkins-bot:
[research/recommendation-api] Add script to profile resource usage

https://gerrit.wikimedia.org/r/341341

For February 2017, the related_articles type was queried 9955 times from within the translation type as logged in https://meta.wikimedia.org/wiki/Schema:TranslationRecommendationAPIRequests

I'm unaware of any other significant uses of the endpoint at this time. We'll have to get proper logging in place to measure that.

Change 341578 had a related patch set uploaded (by nschaaf):
[research/recommendation-api] Add script to measure throughput

https://gerrit.wikimedia.org/r/341578

On a 15" mid 2015 MacBook Pro, the output from the throughput script:
related_articles: Processed 1000 requests in 1882.4070160388947 seconds

Change 341578 merged by jenkins-bot:
[research/recommendation-api] Add script to measure throughput

https://gerrit.wikimedia.org/r/341578

On a 15" mid 2015 MacBook Pro, the output from the throughput script:
related_articles: Processed 1000 requests in 1882.4070160388947 seconds

The proper way would be to use ab on the service and measure that client-perceived time. Also, could you provide some info as to the memory consumption during extended workloads?

~$ ab -l -n 100 "https://recommend-test.wmflabs.org/types/related_articles/v1/articles?source=en&seed=Apple"
This is ApacheBench, Version 2.3 <$Revision: 1748469 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking recommend-test.wmflabs.org (be patient).....done


Server Software:        nginx/1.11.3
Server Hostname:        recommend-test.wmflabs.org
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256

Document Path:          /types/related_articles/v1/articles?source=en&seed=Apple
Document Length:        Variable

Concurrency Level:      1
Time taken for tests:   273.731 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      180500 bytes
HTML transferred:       148900 bytes
Requests per second:    0.37 [#/sec] (mean)
Time per request:       2737.305 [ms] (mean)
Time per request:       2737.305 [ms] (mean, across all concurrent requests)
Transfer rate:          0.64 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      335 1007 193.3   1058    1399
Processing:  1344 1730 407.4   1634    3691
Waiting:     1343 1730 407.2   1634    3691
Total:       1846 2737 429.4   2690    4452

Percentage of the requests served within a certain time (ms)
  50%   2690
  66%   2752
  75%   2799
  80%   2846
  90%   3029
  95%   3900
  98%   4359
  99%   4452
 100%   4452 (longest request)
~$ ab -l -n 100 "https://recommend-test.wmflabs.org/types/related_articles/v1/items?seed=Q89"
This is ApacheBench, Version 2.3 <$Revision: 1748469 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking recommend-test.wmflabs.org (be patient).....done


Server Software:        nginx/1.11.3
Server Hostname:        recommend-test.wmflabs.org
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256

Document Path:          /types/related_articles/v1/items?seed=Q89
Document Length:        Variable

Concurrency Level:      1
Time taken for tests:   168.108 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      99000 bytes
HTML transferred:       67500 bytes
Requests per second:    0.59 [#/sec] (mean)
Time per request:       1681.075 [ms] (mean)
Time per request:       1681.075 [ms] (mean, across all concurrent requests)
Transfer rate:          0.58 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      339  862 243.8   1049    1102
Processing:   706  819  87.9    809    1342
Waiting:      705  817  87.8    807    1341
Total:       1048 1681 247.8   1798    2031

Percentage of the requests served within a certain time (ms)
  50%   1798
  66%   1855
  75%   1882
  80%   1891
  90%   1934
  95%   1958
  98%   2007
  99%   2031
 100%   2031 (longest request)

Memory by time


Sampled with
while :; do echo -n "time:" && date +"%s.%3N" && cat /proc/17573/status | grep Vm | tr -d [:blank:] | sed 's/kB$//g' && sleep .5; done

@schana was that sampled while the service was under load? Ideally, we want to know how the service behaves while there are a lot of concurrent requests hitting it.

@schana was that sampled while the service was under load? Ideally, we want to know how the service behaves while there are a lot of concurrent requests hitting it.

Yes, ab was run to create the load.

If a longer duration or different sampling method is preferable, please let me know.

I'd be interested in higher concurrency, something like ab -n10000 -c50

@mobrovac I think we should discuss what @leila mentioned regarding the results of evaluating this service and whether we should continue this effort right now, or just focus on the translation type.

all, per @Fjalapeno we know that we will not productionize related-articles API. as a result, I'll decline this task and you should feel free to open it when/if relevant in the future.