Investigate performance degradation at high concurrencies in php-fpm
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	jijiki
	Oct 18 2021, 2:08 PM

Description

It is known, and proved, that at high concurrencies, php-fpm's performance degrades exponentially even we have more available workers and host resources. We have been suspecting that APCu locking could be the culprit.

If it is so, and the issue remains in later versions, our goal is to make better use of onhost memcached by storing objects we were normally storing in apcu until now.

Related Objects
Search...

Status	Assigned	Task
Stalled	None	T255792 Quibble runs core:unit tests twice!
Open	None	T328919 Upgrade to PHPUnit 10
Open	None	T338103 Micro-optimize ApiResult::isMetadataKey with str_starts_with once we support PHP8+
Open	None	T328921 Drop PHP 7.4 support from MediaWiki
Stalled	None	T334726 Use return type `never` in Wikibase
Open	None	T328922 Drop PHP 8.0 support from MediaWiki
Stalled	None	T319055 Upgrade to psr/container 2.x
Stalled	Krinkle	T319432 Migrate WMF production from PHP 7.4 to PHP 8.1
Open	None	T291916 Tracking task for Bullseye migrations in production
Stalled	None	T356293 Migrate MW appservers' base images to bullseye
Open	None	T290536 Serve production traffic via Kubernetes
Resolved	jijiki	T280497 Benchmark performance of MediaWiki on k8s
Resolved	aaron	T293630 Investigate performance degradation at high concurrencies in php-fpm

Event Timeline

jijiki created this task.Oct 18 2021, 2:08 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 18 2021, 2:08 PM

jijiki added a parent task: T280497: Benchmark performance of MediaWiki on k8s.Oct 18 2021, 2:09 PM

• dpifke assigned this task to aaron.Oct 18 2021, 6:17 PM

• dpifke moved this task from Inbox, needs triage to Doing: Prio Interrupt on the Performance-Team board.

jijiki updated the task description. (Show Details)Oct 27 2021, 9:53 AM

In T280497#7460370, @aaron wrote:

I have some scripts in my home dir on mwdebug1001.eqiad.wmnet ([..] apcu_rw_test.php).

TK-999 subscribed.Nov 4 2021, 10:32 AM

aaron triaged this task as Low priority.Jan 7 2022, 1:41 AM

aaron added a subtask: T225968: Profile and visualise time spent per component/extension in MW entry points.

aaron removed a subtask: T225968: Profile and visualise time spent per component/extension in MW entry points.

aaron raised the priority of this task from Low to High.Jan 7 2022, 1:43 AM

Paladox subscribed.Jan 7 2022, 11:45 PM

Using https://gist.github.com/AaronSchulz/28a2cc7701a33adca1479b5ff6530b2c and ab , apcu perfomance degradation was tested in a number of scenarios on a depooled host. When doing high writes to a set keys of random sizes (128 bytes to 1MB), the global write locks slow down even simple read-only requests (e.g. apcu_fetch). Inducing memory fragmentation (reported by apc.php) only makes it worse. Another antipattern is quickly filling up the cache up with an overly large working-set and causing resets, which creates an endless cycle of sets and cache flushes, with reads being slow.

Similarly, I can produce 10x to 100x slow downs locally (36 fpm workers, ab concurrency of 36, CPU with 36 logical processors).

The only mitigation I see is:

Avoid the use of key classes with high cardinality and large values
Minimize the use of key classes with large values and high write rates
Tweak the write rate of loadbalancer lag state keys
Use a deferred updates with a mutex to prune expired entries
Experiment with low values of apc.ttl (this can make it worse in some cases)
The splitting of worker count across machines lowers global lock contention helps (hence, why the k8 pod setup performed better)

Tim, Timo, and I looked at the apcu graphs and do not see a need for fragmentation avoidance (e.g. via pruning) nor limiting space. The periodic flushes are from deployments and are tolerable.

Krinkle mentioned this in T315392: Optimize APCUBagOStuff::lock and/or replace some callers with FSLockManager.Aug 16 2022, 11:28 PM

aaron mentioned this in T315472: Reduce request latency impact from APCUBagOStuff write lock contention.Aug 17 2022, 6:41 PM

Investigate performance degradation at high concurrencies in php-fpm Closed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Investigate performance degradation at high concurrencies in php-fpm
Closed, ResolvedPublic
Actions

Related Objects
Search...