Page MenuHomePhabricator

Investigate using jvmquake to limit the time a JVM is unusable due to GC overhead
Open, Needs TriagePublic

Description

As as a maintainer of a service running on top of the JVM I want the JVM to rapidly quit if it enters a gc death spiral so that the service increase its availability.

The default heuristics used by the JVM to kill itself (-XX:+ExitOnOutOfMemoryError) are too conservative to make them useful for real production use cases.
jvmquake seems to circumvent these problems by allowing more flexible heuristics to detect when the JVM will be stuck in a death spiral, see article: https://netflixtechblog.medium.com/introducing-jvmquake-ec944c60ba70.

This approach might be useful for several services:

  • blazegraph sometimes stuck in a death spiral certainly triggered by a bad query
  • cloudelastic sometimes misbehaving because of the GC

AC:

  • debian package exists for jvmquake
  • jvmquake is deployed on Blazegraph with puppet
  • jvmquake is configured in reporting only mode