We've been hit by a new bug in HHVM, again happening mostly on the API cluster which is a different beast than the bug we reported a few weeks ago (https://phabricator.wikimedia.org/T182568):
- some threads are locked into starving out one cpu at a time
- strace shows no syscalls being made
- inspecting a thread with perf shows most of the time is spent in HPHP::Class::getDeclPropIndex
I didn't go much further with my debugging session, as that part of the codebase is full of abstractions that are quite hard to follow.