Maniphest T209099

Establish baseline performance of Python/WSGI frameworks
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Eevans
	Nov 8 2018, 8:05 PM

Description

Prior to committing to a framework and a WSGI server we decided to test the following servers with no framework: cherrypy, gunicorn, meinheld, and uwsgi. The list of servers is short and was chosen based on current uses, documentation, stack overflow resources, and community size. Suggestions for other servers are welcomed.

The servers ran on a Debian virtual machine with 2 CPU cores and 4 GB RAM. They were tested from another identical virtual machine using wrk, an HTTP benchmarking tool. The servers were tested with an increasing number of simultaneous connections, ranging from 10 to 500. Each test lasted 3 minutes and was repeated 3 times. The average of the results can be found in the graphs below. We chose to focus on the requests/second, latency, and errors. uwsgi errors were not included due to wrk misidentifying uwsgi responses as read errors.

More information on the methodology and code base can be found in the repository.

Results

WSGI container/server performance

Python framework performance

Meinheld Latency.png (371×600 px, 16 KB)

To better understand the performance costs of frameworks the tests above were rerun using the same environment but with two popular frameworks, Flask and Cherrypy. Suggestions for other frameworks are welcomed.

The graphs above compare meinheld’s requests/second, latency, and errors in Flask, Cherrypy, and without a framework. Results for the other servers can be found in the spreadsheet.

Related Objects
Search...

Status	Assigned	Task
Resolved	aaron	T88445 MediaWiki active/active datacenter investigation and work (tracking)
Resolved	Eevans	T206016 Create a service for session storage
Resolved	• Clarakosi	T209099 Establish baseline performance of Python/WSGI frameworks

Event Timeline

Eevans triaged this task as Medium priority.Nov 8 2018, 8:05 PM

Eevans created this task.

Eevans added a project: User-Clarakosi.Nov 8 2018, 8:24 PM

Eevans updated the task description. (Show Details)Nov 27 2018, 9:03 PM

Eevans updated the task description. (Show Details)Nov 27 2018, 11:03 PM

Eevans reassigned this task from Eevans to • Clarakosi.Dec 7 2018, 4:51 PM

Eevans updated the task description. (Show Details)Dec 7 2018, 4:56 PM

Eevans updated the task description. (Show Details)Dec 7 2018, 4:58 PM

Eevans updated the task description. (Show Details)Dec 7 2018, 5:00 PM

Eevans updated the task description. (Show Details)Dec 7 2018, 7:32 PM

• Clarakosi updated the task description. (Show Details)Dec 7 2018, 7:59 PM

• Clarakosi moved this task from Backlog to In-Progress on the User-Clarakosi board.Dec 7 2018, 10:16 PM

• Clarakosi updated the task description. (Show Details)Dec 13 2018, 10:10 PM

• Clarakosi updated the task description. (Show Details)Dec 13 2018, 10:17 PM

Eevans updated the task description. (Show Details)Dec 30 2018, 2:01 AM

Kask performance testing is on-going, but I wanted to share some initial (early) results:

Methodology

Kask running on sessionstore1001.eqiad.wmnet (w/ open files bumped to 4096)

[[ https://github.com/wg/wrk | wrk ]] run from sessionstore1002.eqiad.wmnet (threads 8, concurrency 4096, duration 5 minutes). All requests were GETs of a single key with a value trivial in size.

Results

52k reqs/sec throughput
19.54ms average latency
50% 36.67ms
75% 37.15ms
90% 37.37ms
99% 39.36ms

Takeaways

The throughput is quite good, I see no problems in this regard.

The latency numbers are...suspicious. The numbers seen here are eerily close to what I see locally on my notebook, despite very different throughput. Additionally, the Prometheus metrics from Kask paint an even stranger picture:

...
http_request_duration_seconds_bucket{code="200",method="GET",le="0.001"} 2.756884e+06
http_request_duration_seconds_bucket{code="200",method="GET",le="0.0025"} 7.932338e+06
http_request_duration_seconds_bucket{code="200",method="GET",le="0.005"} 8.487417e+06
http_request_duration_seconds_bucket{code="200",method="GET",le="0.01"} 8.54812e+06
http_request_duration_seconds_bucket{code="200",method="GET",le="0.025"} 8.552673e+06
http_request_duration_seconds_bucket{code="200",method="GET",le="0.05"} 1.7099048e+07
http_request_duration_seconds_bucket{code="200",method="GET",le="0.1"} 1.7108523e+07
http_request_duration_seconds_bucket{code="200",method="GET",le="0.25"} 1.7108744e+07
http_request_duration_seconds_bucket{code="200",method="GET",le="0.5"} 1.7108744e+07
http_request_duration_seconds_bucket{code="200",method="GET",le="1"} 1.7108744e+07
http_request_duration_seconds_bucket{code="200",method="GET",le="+Inf"} 1.7108744e+07
http_request_duration_seconds_sum{code="200",method="GET"} 328809.99557739915
http_request_duration_seconds_count{code="200",method="GET"} 1.7108744e+07
...

The distribution here indicates that about half of the requests fall between 25-50ms, and the other half (49.6%) are less than or equal to 5ms; 46.3% are less than or equal to 2.5%! Taken together with the numbers from wrk, it would seem that roughly 1/2 of the requests are 37ms (+/-2ms), and ~1/2 that are 2.5ms (or less).

Next steps

Determine source of bizarre latency distribution
Get Cassandra dashboards setup (https://gerrit.wikimedia.org/r/497848)
Get numbers from a more representative request load (GET, POST & DELETE)

EvanProdromou subscribed.Mar 29 2019, 6:50 PM

Eevans renamed this task from Establish baseline performance of the session storage service to Establish baseline performance of Python/WSGI frameworks.Apr 17 2019, 8:58 PM

Eevans closed this task as Resolved.

Eevans updated the task description. (Show Details)

	F27541641: Requests%2Fsec (without Meinheld).png
	Dec 13 2018, 10:17 PM

	F27424713: Screenshot_2018-12-07 Preliminary Results(4).png
	Dec 7 2018, 7:32 PM

	F27541501: Meinheld Errors.png
	Dec 13 2018, 10:10 PM

	F27541496: Meinheld Latency.png
	Dec 13 2018, 10:10 PM

Establish baseline performance of Python/WSGI frameworksClosed, ResolvedPublicActions

Description

Results

WSGI container/server performance

Python framework performance

Related ObjectsSearch...

Event Timeline

Methodology

Results

Takeaways

Next steps

Establish baseline performance of Python/WSGI frameworks
Closed, ResolvedPublic
Actions

Related Objects
Search...