Page MenuHomePhabricator

Investigate performance degradation at high concurrencies in php-fpm
Open, HighPublic

Description

It is known, and proved, that at high concurrencies, php-fpm's performance degrades exponentially even we have more available workers and host resources. We have been suspecting that APCu locking could be the culprit.

If it is so, and the issue remains in later versions, our goal is to make better use of onhost memcached by storing objects we were normally storing in apcu until now.

Event Timeline

I have some scripts in my home dir on mwdebug1001.eqiad.wmnet ([..] apcu_rw_test.php).

aaron raised the priority of this task from Low to High.Jan 7 2022, 1:43 AM

Using https://gist.github.com/AaronSchulz/28a2cc7701a33adca1479b5ff6530b2c and ab , apcu perfomance degradation was tested in a number of scenarios on a depooled host. When doing high writes to a set keys of random sizes (128 bytes to 1MB), the global write locks slow down even simple read-only requests (e.g. apcu_fetch). Inducing memory fragmentation (reported by apc.php) only makes it worse. Another antipattern is quickly filling up the cache up with an overly large working-set and causing resets, which creates an endless cycle of sets and cache flushes, with reads being slow.

Similarly, I can produce 10x to 100x slow downs locally (36 fpm workers, ab concurrency of 36, CPU with 36 logical processors).

The only mitigation I see is:

  • Avoid the use of key classes with high cardinality and large values
  • Minimize the use of key classes with large values and high write rates
  • Tweak the write rate of loadbalancer lag state keys
  • Use a deferred updates with a mutex to prune expired entries
  • Experiment with low values of apc.ttl (this can make it worse in some cases)
  • The splitting of worker count across machines lowers global lock contention helps (hence, why the k8 pod setup performed better)