Page MenuHomePhabricator

Evaluate scalability and performance of PHP7 compared to HHVM
Closed, ResolvedPublic

Description

Moving to PHP 7 is kind-of a forced move for us; nonetheless, we need to evaluate how this will affect our users and our infrastructure.

Specifically:

  • Does PHP7 add latency to a single request? If so, how much?
  • Does PHP7 support the same throughput HHVM can support? If not, how big is the penalty we expect?

Since there is abundant evidence that PHP 7.2 performs better than PHP 7.0, we might want to repeat the experiments once we have php 7.2 available, if the results are particularly bad.

Event Timeline

jijiki triaged this task as Medium priority.Oct 26 2018, 10:19 AM
jijiki added a subscriber: Performance-Team.

What about our use of register_postsend_function? Is there anything equivalant?

Is there anything specific being asked of the Performance Team, or is this something that @Joe (or others) were planning to do?

Mentioned in SAL (#wikimedia-operations) [2018-11-15T12:57:37Z] <_joe_> upping pm.maxworkers to 40 on mw1261 on php7.2-fpm, benchmarking T206341

Is there anything specific being asked of the Performance Team, or is this something that @Joe (or others) were planning to do?

Sorry I was so absorbed by the other tasks that I forgot to answer:

I don't think I need help from the performance team for these preliminary tests, but once php7 is available publicly (so once T206339 is done) I would love to involve your team in more accurate evaluations of the effect of the switch in terms of actual browsing.

Is there anything specific being asked of the Performance Team, or is this something that @Joe (or others) were planning to do?

Sorry I was so absorbed by the other tasks that I forgot to answer:

I don't think I need help from the performance team for these preliminary tests, but once php7 is available publicly (so once T206339 is done) I would love to involve your team in more accurate evaluations of the effect of the switch in terms of actual browsing.

Sounds good, thanks!

Here are the first results that I feel comfortable sharing!

I tested the english wikipedia pages for Australia (a mid-sized, not overly complex page) and Barack Obama (a notoriously complex and large page) at different levels of concurrency with both HHVM and PHP7, running on the same host.

For the sake of the test, I depooled one canary appserver (mw1261) and raised the maximum number of workers for php-fpm to 40 (the number of cores of the server), and run test sequentially first on HHVM, then on PHP7.

I did 5 passes of tests at different concurrencies, respectively 5 (for a total of 2k requests), 10, 15, 30, 45 for a total of 10k requests each. The run at concurrency 45 was only performed for the Obama page in this test

Here are the results for the Australia page

australia_latencies.png (600×800 px, 8 KB)

As you can see, php-fpm thoroughly outperforms HHVM at any concurrency level. While the difference is minimal for the mean response time, the advantage becomes larger if we consider higher percentiles.

Now for the Obama page:

obama_latencies.png (600×800 px, 9 KB)

as you can see in this case PHP7 thoroughly outperforms HHVM across the board.

I still didn't a very detailed analysis of resource utilization, but it seems comparable.

I will keep doing more tests for other endpoints as well, but these results show that there should be no real risk in the transition in terms of scalability, and that on the other hand we'll get a small advantage in terms of latencies.

Have you diffed the output coming from HHVM and PHP7, to ensure that they're generating the same HTML for these pages?

Have you diffed the output coming from HHVM and PHP7, to ensure that they're generating the same HTML for these pages?

Our HTML is never exactly the same between two different parsings (wgBackendResponseTime, various comments, the request id creeping into javascript), which makes checking that harder. But yes, I verified both pages and only saw differences within <script> tags

Forcing a reparse of the Obama page by requesting
curl -g -b "PHP_ENGINE=php7" -H 'Host: en.wikipedia.org' 'http://mw1261.eqiad.wmnet/w/api.php?action=parse&text={{:Barack%20Obama}}'

(served by php 7.2) and then

curl -g -H 'Host: en.wikipedia.org' 'http://mw1261.eqiad.wmnet/w/api.php?action=parse&text={{:Barack%20Obama}}'

(served by HHVM) I got the following parser timings:

metricHHVMPHP 7.2 (php-fpm)
CPU time9.3367.004
Real time10.0787.795
Lua time3.93.0

Please note that this is just roughly indicative, I will need to do some more thorough benchmarking of proper parsing performance.

After more thorough analisys of parsing the Obama page:

  • At low concurrency, PHP 7.2 thoroughly outperforms HHVM on parsing-heavy jobs
  • When concurrency is higher (up until about 15 concurrent threads doing parsing) things keep on par
  • At higher concurrencies, PHP 7.2 basically breaks down much faster than what HHVM does.

This underlines a potential risk if we get a lot of parse requests and/or if we lose a good chunk of the parsercache, we will be less able to recover once fully migrated, but in the average usage will likely produce lower latencies to the users

Results for more endpoints:

  • PHP7 outperforms HHVM significantly for requests that involve /w/static.php (so most static files we serve), but while the relative difference is clear, the absolute latency gain is in the order of a few milliseconds, with large variance.
# HHVM 40k requests, 200 concurrency
Response time histogram:
  0.000 [1]	|
  0.069 [39875]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.137 [47]	|
  0.206 [17]	|
  0.274 [1]	|
  0.342 [25]	|
  0.411 [27]	|
  0.479 [4]	|
  0.548 [0]	|
  0.616 [1]	|
  0.685 [2]	|

# PHP7  40k requests, 200 concurrency
Response time histogram:
  0.000 [1]	|
  0.053 [39928]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.106 [3]	|
  0.159 [1]	|
  0.212 [7]	|
  0.265 [21]	|
  0.318 [27]	|
  0.371 [0]	|
  0.424 [3]	|
  0.477 [0]	|
  0.529 [9]	|
  • HHVM severely outpeforms PHP7 for requests that involve /w/load.php - I could get a sustained workrate of 770 req/s from HHVM, compared to the 580 req/s I got from PHP7. The latency per request is around 20 ms higher for php7 (median). but with a longer tail (up to 90 ms at the 95th percentile.
## HHVM 10k requests, max concurrency 200
Response time histogram:
  0.030 [1]	|
  0.066 [6602]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.101 [3203]	|■■■■■■■■■■■■■■■■■■■
  0.136 [126]	|■
  0.171 [27]	|
  0.206 [22]	|
  0.241 [10]	|
  0.276 [6]	|
  0.311 [1]	|
  0.346 [1]	|
  0.381 [1]	|

# PHP7 10k requests, max concurrency 200
Response time histogram:
  0.029 [1]	|
  0.094 [6892]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.159 [3007]	|■■■■■■■■■■■■■■■■■
  0.224 [74]	|
  0.289 [1]	|
  0.354 [0]	|
  0.419 [0]	|
  0.484 [0]	|
  0.549 [0]	|
  0.614 [8]	|
  0.680 [17]	|

Last thing to note:

  • pm = static vs pm = dynamic didn't really changed much for long-lasting requests; it made smaller requests faster though, so it's a net win
  • we need some tool to inspect php-fpm's inner workings in order to find what is going on there. I might need to look at perf recordings to get an idea. phpspy might help too.

@Joe Might be interesting to look at specific calls that appear to perform less well, to see if we can identify specific calls that are slower. xhprof/tideways might be an approach...

@Joe Might be interesting to look at specific calls that appear to perform less well, to see if we can identify specific calls that are slower. xhprof/tideways might be an approach...

This is surely the approach we should take for things that are consistently, sensibly slower on PHP7 than on HHVM on even a single call. But the performance loss here only happens at high concurrency, which I think depends more on php-fpm's architecture and current sub-par configuration than on slowness in specific parts of the code.

I started testing with a very plain configuration of the daemon, specifically to avoid adding configuration knobs of dubious benefit. I suspect one of the reason of the worse performance at high concurrency is the opcache configuration, which I intend to optimize next. Before changing parameters based on general reccomendations, I want to be able to have some observability of php-fpm (hence T209573); optimizing the code is surely important (in particular for low-overhead calls like load.php) but I think it should be the next step after gathering metrics and doing incremental improvements to our configuration.

Of course if you want to start looking into profiling information, you're free to do it - would it make sense to wait a couple weeks to also have excimer installed? That would allow to gather sampling information at high concurrency.

Mentioned in SAL (#wikimedia-operations) [2018-11-27T07:35:47Z] <_joe_> depooling mw1261 for benchmarking, T206341

Change 476499 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] profile::mediawiki::php: add opcache tuning for php-fpm

https://gerrit.wikimedia.org/r/476499

Change 476500 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] profile::mediawiki::php: tune php-fpm parameters

https://gerrit.wikimedia.org/r/476500

Over the last week, I did a thorough benchmarking effort, with the setup being the following:

  • I tested real-life pages in our production environment
  • I tested those pages at different level of concurrency
  • I depooled one appserver (mw1261) for the test, and only ran tests on either HHVM or PHP7 at the same time.
  • Server has a double 10-core Intel Xeon E5-2650 v3 @ 2.30GHz and 64 GB of RAM

the full results (including data and cleaned data, can be accessed on my own space on people.wm.o https://people.wikimedia.org/~oblivian/T206341/. See the README for a thorough explanation

In general, I found HHVM's performance to degrade less severely than PHP7's at higher concurrencies, but still, PHP7 outperforms HHVM significantly in rendering tests for the two articles I tested, is on par for load.php calls, but is severely outperformed (esp. at higher concurrencies) in re-parsing a page and in rendering the main page. I would call this a tie between HHVM and decently optimized php7.

The gist of my results is that once you allow php-fpm to have enough workers, all other optimizations are quite clearly second-order gains. Having said that, some things I found out are in stark contrast with what logic and some benchmarking literature seems to suggest. I suspect part of it is me doing tests at high concurrencies, where some effects are more visible.

Specifically:

  1. pm = static doesn't give any performance advantage; if we remove the first few seconds from every test, it's even more evident that it's hurtful to performance comeared to pm=dynamic (see for instance https://people.wikimedia.org/~oblivian/T206341/images/workers_heavy_page_c40.png or https://people.wikimedia.org/~oblivian/T206341/images/workers_re-parse_c40.png
  2. Opcache optimizations work, albeit they have a relatively little effect. Unsuprisingly, completely removing revalidation didn't give any advantage over re-checking every 60 seconds; quite surprisingly, it caused a small performance hit. I'm less confident in this counter-intuitive result than I am of others, as I never re-run the test to double-check.
  3. Connecting the httpd daemon with the FastCGI application server via unix socket gives us significant advantages in terms of performance especially at higher concurrencies

I have put up a series of patches to follow un to this benchmarking effort in our actual production configuration.

Change 476499 merged by Giuseppe Lavagetto:
[operations/puppet@production] profile::mediawiki::php: add opcache tuning for php-fpm

https://gerrit.wikimedia.org/r/476499

Change 476500 merged by Giuseppe Lavagetto:
[operations/puppet@production] profile::mediawiki::php: tune php-fpm parameters

https://gerrit.wikimedia.org/r/476500