Page MenuHomePhabricator

[Spike] Benchmark built-in HTTP server options for scap3 fanout
Closed, ResolvedPublic2 Estimated Story Points

Description

Goals

  • Ideally we want minimal dependencies and zero configuration for ease of maintenance.
  • Something that's integrated into python would be good but a standalone binary would also work.
  • Lightweight is desirable, however, performance should not be a major issue with only ~40 simultaneous connections.

Our options thus far:

namelanguagedepends oncomments
gpackPythongit, geventGit "smart" protocol (http) server implemented in python as a wsgi app
aiohttpPython3Python >= 3.4.1, chardet or cchardetSimple asyncio http implementation. Requires just a bit of boilerplate python code to manage the server.
twistdPythonA bunch of themtwisted does include a standalone web server as an example, and packages are in debian/ubuntu. It fails the 'lightweight' and 'simple' goals though.
SimpleHTTPServerPythonPythonNo extra dependencies and no configuration required. Questionable Unacceptable performance and scalability. Renamed to http.server in python 3.x
GunicornPythonUbuntu >= 12.04 or Debian JessieWSGI server front-end. This or uWSGI could be used as a front-end for gpack or sina
uWSGIPython?WSGI server front-end. Apparently this is already used in production as a front-end for other WMF services.

Performance/dependency testing/benchmark things

Event Timeline

dduvall moved this task from Needs triage to Experiments on the Scap board.

Played around with an idea this evening. Wrote a python script that spun up 11 DigitalOcean Droplets.

On droplet 1 I ran:

 git clone https://github.com/wikimedia/mediawiki.git
cd mediawiki
git update-server-info
cd .git
python -m SimpleHTTPServer

On the remaining 10 droplets, I used clustershell to run: git clone http://[droplet1-ip]/ cats in parallel.

The droplets were named: fuck-do-it[0-9] You can see their output from clustershell below:

fuck-do-it[1-3,5,8-9]: Cloning into 'cats'...
fatal: unable to access 'http://10.132.28.178:8000/': Recv failure: Connection reset by peer
fuck-do-it[0,6]: Cloning into 'cats'...
error: Unable to get pack file http://10.132.28.178:8000/objects/pack/pack-9a9078d4d54b056d12b20dc730e55891b265f453.pack
Recv failure: Connection reset by peer
error: Unable to find d4d086b0cee47b8110eeb12bf870b7a75bd0e05a under http://10.132.28.178:8000
Cannot obtain needed object d4d086b0cee47b8110eeb12bf870b7a75bd0e05a
error: fetch failed.
fuck-do-it[4,7]: Cloning into 'cats'...
Checking out files: 100% (6065/6065), done.
Checking out files: 100% (6065/6065)

Only 2/10 droplets managed to get 100% of the clone :(

Running git -C cats status across the cluster shows:

fuck-do-it[0-3,5-6,8-9]: fatal: Cannot change to 'cats': No such file or directory
fuck-do-it[4,7]: On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean

I'm confident saying that SimpleHTTPServer should be out of the race.

aiohttp seems to have decent performance, from someone else's testing:
https://groups.google.com/forum/#!topic/aio-libs/3TaOnHqryn0

It can surely handle 40 simultaneous connections.

I'll put together a test or two.

@mmodell that gpack thing is pretty cool, seems to work well.

I forked it, made some tweaks: https://github.com/thcipriani/gpack

Did the same digital ocean test. Tested with 30 instances using my dumb digital ocean, make lots of instances and run lots of commands in parallel script: https://gist.github.com/thcipriani/6c2edf3fa8161aa79676

Worked great, all boxes were able to pull down the repo fairly quickly and reliably.

@thcipriani: I tested it also, and had similarly good results.

mmodell set the point value for this task to 2.

gpack seems to meet all of our needs.