Page MenuHomePhabricator

Track down and resolve cause for high system CPU% on HHVM API servers
Closed, ResolvedPublic

Description

The HHVM servers in the API pool show high system CPU% usage, with peak values that are consistent with one or more cores being maxed (12.5% and 25% are common). Zend app servers handling the same load on similar hardware specs rarely go above 5%.

Event Timeline

ori raised the priority of this task from to Needs Triage.
ori updated the task description. (Show Details)
ori added a project: MediaWiki-Core-Team.
ori moved this task to Backlog on the MediaWiki-Core-Team board.
ori changed Security from none to None.
ori added subscribers: ori, tstarling.
ori triaged this task as High priority.Nov 24 2014, 1:24 AM

I collected system call counts with perf on Zend and HHVM servers and diffed them. I was not sure how to get time-per-call stats from perf, but the counts are still interesting: readlink, lseek and newlstat are the standouts.

While poking around the HHVM source tree, I came across the hhvm.server.stat_cache runtime option, which enables StatCache, which caches stat calls, using inotify to invalidate. I enabled it on mw1081 and system CPU% was cut nearly in half.

The next step is to test it more thoroughly to verify that cache invalidation happens promptly and reliably when the files are touched.

ori claimed this task.

Set hhvm.server.stat_cache to true everywhere.

pasted_file (961×781 px, 158 KB)