Gather performance info for Translation type in Recommendation API
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Fjalapeno
	Mar 3 2017, 4:13 PM

Description

As the Translation service exists now, what are the following values?

Throughput
Request volume
Resource usage per request
- Memory
- CPU

Results summary

for details, see comments

Throughput: 1000 requests / 2789.6 seconds
Request volume: 22049 requests / month
Resource usage:
- Memory: up to 12MB during request, but results were inconsistent
- CPU: <.01 seconds for query, excluding locking and io

Details

	Subject	Repo	Branch	Lines +/-
	Add script to measure throughput	research/recommendation-api	master	+47 -0
	Add script to profile resource usage	research/recommendation-api	master	+84 -4

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		• schana	T148129 Productization of Recommendation API
		Resolved		• schana	T159544 Gather performance info for Translation type in Recommendation API

Event Timeline

See related ticket: T159528

Aklapper added a project: Performance Issue.Mar 3 2017, 7:08 PM

• Fjalapeno renamed this task from Gather performance info for Translation API to Gather performance info for Translation type in Recommendation API.Mar 3 2017, 8:37 PM

Change 341341 had a related patch set uploaded (by nschaaf):
[research/recommendation-api] Add script to profile resource usage

https://gerrit.wikimedia.org/r/341341

gerritbot added a project: Patch-For-Review.Mar 6 2017, 3:59 PM

Output from profile script: P5017

Change 341341 merged by jenkins-bot:
[research/recommendation-api] Add script to profile resource usage

https://gerrit.wikimedia.org/r/341341

For February 2017, the translation type was queried 22049 times as logged in https://meta.wikimedia.org/wiki/Schema:TranslationRecommendationAPIRequests

Change 341578 had a related patch set uploaded (by nschaaf):
[research/recommendation-api] Add script to measure throughput

https://gerrit.wikimedia.org/r/341578

On a 15" mid 2015 MacBook Pro, the output from the throughput script:
Processed 1000 requests in 2789.583893060684 seconds

• schana updated the task description. (Show Details)Mar 7 2017, 6:35 PM

• schana moved this task from Backlog to For Review on the Recommendation-API board.

• schana removed a project: Patch-For-Review.Mar 7 2017, 6:41 PM

Change 341578 merged by jenkins-bot:
[research/recommendation-api] Add script to measure throughput

https://gerrit.wikimedia.org/r/341578

~$ ab -l -n 100 "https://recommend-test.wmflabs.org/types/translation/v1/articles?source=en&target=de"
This is ApacheBench, Version 2.3 <$Revision: 1748469 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking recommend-test.wmflabs.org (be patient).....done


Server Software:        nginx/1.11.3
Server Hostname:        recommend-test.wmflabs.org
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256

Document Path:          /types/translation/v1/articles?source=en&target=de
Document Length:        Variable

Concurrency Level:      1
Time taken for tests:   249.440 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      144558 bytes
HTML transferred:       112958 bytes
Requests per second:    0.40 [#/sec] (mean)
Time per request:       2494.403 [ms] (mean)
Time per request:       2494.403 [ms] (mean, across all concurrent requests)
Transfer rate:          0.57 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      338 1014 166.9   1054    1386
Processing:  1166 1480 233.8   1445    2281
Waiting:     1164 1479 233.7   1445    2280
Total:       1827 2494 294.6   2471    3331

Percentage of the requests served within a certain time (ms)
  50%   2471
  66%   2571
  75%   2665
  80%   2715
  90%   2872
  95%   3058
  98%   3305
  99%   3331
 100%   3331 (longest request)

~$ ab -l -n 100 "https://recommend-test.wmflabs.org/types/translation/v1/articles?source=en&target=de&seed=Apple&search=morelike"
This is ApacheBench, Version 2.3 <$Revision: 1748469 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking recommend-test.wmflabs.org (be patient).....done


Server Software:        nginx/1.11.3
Server Hostname:        recommend-test.wmflabs.org
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256

Document Path:          /types/translation/v1/articles?source=en&target=de&seed=Apple&search=morelike
Document Length:        Variable

Concurrency Level:      1
Time taken for tests:   255.318 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      144500 bytes
HTML transferred:       112900 bytes
Requests per second:    0.39 [#/sec] (mean)
Time per request:       2553.180 [ms] (mean)
Time per request:       2553.180 [ms] (mean, across all concurrent requests)
Transfer rate:          0.55 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      547 1030 137.1   1061    1315
Processing:  1210 1523 260.2   1479    3374
Waiting:     1209 1522 260.2   1478    3372
Total:       2033 2553 282.8   2532    4435

Percentage of the requests served within a certain time (ms)
  50%   2532
  66%   2609
  75%   2641
  80%   2670
  90%   2798
  95%   2865
  98%   3368
  99%   4435
 100%   4435 (longest request)

Memory by time

translation_memory_kB.csv46 KBDownload

Sampled with
while :; do echo -n "time:" && date +"%s.%3N" && cat /proc/17573/status | grep Vm | tr -d [:blank:] | sed 's/kB$//g' && sleep .5; done

Benchmark with higher concurrency:

~$ ab -l -c 50 -n 1000 "https://recommend-related-articles.wmflabs.org/types/translation/v1/articles?source=en&target=de"
This is ApacheBench, Version 2.3 <$Revision: 1748469 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking recommend-related-articles.wmflabs.org (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:        nginx/1.11.3
Server Hostname:        recommend-related-articles.wmflabs.org
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256

Document Path:          /types/translation/v1/articles?source=en&target=de
Document Length:        Variable

Concurrency Level:      50
Time taken for tests:   151.543 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      1300225 bytes
HTML transferred:       984899 bytes
Requests per second:    6.60 [#/sec] (mean)
Time per request:       7577.153 [ms] (mean)
Time per request:       151.543 [ms] (mean, across all concurrent requests)
Transfer rate:          8.38 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      310 1173 1270.2   1191   22170
Processing:  1683 6204 1171.0   6263    9325
Waiting:     1682 6203 1171.0   6257    9324
Total:       2269 7377 1667.1   7345   28988

Percentage of the requests served within a certain time (ms)
  50%   7345
  66%   7808
  75%   8081
  80%   8279
  90%   8760
  95%   9149
  98%   9783
  99%  10226
 100%  28988 (longest request)

@mobrovac I noticed you weren't subscribed to this task. I ran the benchmark with a higher level of concurrency. Is there anything else you'd like to see?

Oh, I see a drastic change in response times when concurrency is increased. @schana would you be able to investigate where does the extra latency come from when running multiple requests in parallel?

In T159544#3164525, @mobrovac wrote:

Oh, I see a drastic change in response times when concurrency is increased. @schana would you be able to investigate where does the extra latency come from when running multiple requests in parallel?

I'm guessing it has to do with how it's being timed between nginx accepting the request and it being queued by the 16 wsgi workers, but I can investigate next week.

I've been able to simulate the same results by adding a route that uses a busy wait before returning the request.

@app.route('/ab_test')
def ab_test():
    t = time.time()
    while time.time() < t + 2.0:
        pass
    return 'good'

Results:

~$ ab -l -c 80 -n 500 "http://recommend-related-articles.wmflabs.org/ab_test"
This is ApacheBench, Version 2.3 <$Revision: 1757674 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking recommend-related-articles.wmflabs.org (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Server Software:        nginx/1.11.3
Server Hostname:        recommend-related-articles.wmflabs.org
Server Port:            80

Document Path:          /ab_test
Document Length:        Variable

Concurrency Level:      80
Time taken for tests:   64.946 seconds
Complete requests:      500
Failed requests:        0
Total transferred:      100000 bytes
HTML transferred:       2000 bytes
Requests per second:    7.70 [#/sec] (mean)
Time per request:       10391.288 [ms] (mean)
Time per request:       129.891 [ms] (mean, across all concurrent requests)
Transfer rate:          1.50 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      106 1482 4859.4    999   50111
Processing:  2120 5752 978.7   5932    8139
Waiting:     2119 5751 978.8   5930    8139
Total:       2239 7234 5039.2   6769   58049

Percentage of the requests served within a certain time (ms)
  50%   6769
  66%   7050
  75%   7244
  80%   7399
  90%   7874
  95%   8326
  98%  15153
  99%  40832
 100%  58049 (longest request)

@mobrovac Is there anything further that needs tested?

@mobrovac can you help us move forward with this task? It seems what's missing on our end is to understand what level of performance we should meet for the API to be eligible for productization.

I'm concerned with the usage of the blocking (synchronous) model used, since it directly affects the latency of requests, as seen in performance results above. An asynchronous approach would be preferable. The service is basically a stateless one, and so IMHO should not need the master/worker model. Something like Tornado seems more suitable for this scenario. The EventBus HTTP Proxy service uses it, for example.

phuedx unsubscribed.Apr 20 2017, 5:39 PM

If we want to move to an async approach, I think using a more prevalent stack would facilitate long-term maintainability. Either way, a majority of the code will need re-written.

Alternatively, the latency of requests can be handled by increasing the number of workers and the size of the listen queue in uwsgi.

@mobrovac I think moving to an async platform is a decision that should be made with more context than just meeting an acceptable performance benchmark (to include maintainability, familiarity, etc.). I'm happy to chat about this if you think that would help in coordinating the next steps.

I suggested Tornado because we have in-house experience with it, but of course am open to other stacks/technologies.

In T159544#3218848, @schana wrote:

@mobrovac I think moving to an async platform is a decision that should be made with more context than just meeting an acceptable performance benchmark (to include maintainability, familiarity, etc.). I'm happy to chat about this if you think that would help in coordinating the next steps.

The main reason why I think we should move it to an async platform is because the functionality performed by the service is essentially request-based (and hence event-based) and depends only on the current request being handled. In other words, architecturally the service should be a stateless one where every process can perform the exact same task/function. This is not the case currently because of the master/worker distinction. Turning the service into a stateless one would also simplify its deployment and maintenance, which are important parts of a service's life cycle, as is performance, which would increase as a corollary of the switch.

Ups, forgot to address the rewriting aspect. I have not looked at the code intensively, but I wouldn't say that is true. The logic that handles one request is already there. That part would just need to be ported to the new architecture, most likely with small modifications (but those should be fairly simple and straightforward).

• mobrovac added projects: User-mobrovac, Services (designing).May 1 2017, 8:30 PM

@mobrovac I have no problems with the async aspect. I was more referring to the choice between Tornado and other stacks. I spent some time reading the Tornado documentation last week, and it looked to me like the majority of code would need adapted from multiprocessing/requests/flask to their counterparts in Tornado. The work seemed to be enough that moving to tech stacks not necessarily in Python wouldn't be much different. I sent a calendar invite to discuss further.

After meeting with @mobrovac, I'm going to move forward with using https://github.com/wikimedia/service-template-node as a stack to port the code to.

• schana mentioned this in T164282: Port recommendation api to node.May 3 2017, 12:40 PM

This is waiting for T164282

• Mholloway subscribed.May 10 2017, 5:53 PM

Update: the code has been ported to service template node and has been deployed.

@mobrovac Do we need to redo the performance testing with the new stack?

I think this task can be closed. We have gotten out of it what we wanted (an informed decision about the service at the time).

Gather performance info for Translation type in Recommendation APIClosed, ResolvedPublicActions