Page MenuHomePhabricator

consider moving Cassandra to G1GC in production
Closed, ResolvedPublic

Description

More and more people are weighing in with their experiences using the G1 garbage collector and Cassandra. Rumor has it that it enables the use of enormous heap sizes, with little to no tuning required, while still out-performing CMS. If true, it could have significant impact on our node-density story, and seems worth looking into.

G1GC will be the default in Cassandra 3.0.

See also: https://issues.apache.org/jira/browse/CASSANDRA-7486

Event Timeline

Eevans raised the priority of this task from to Needs Triage.
Eevans updated the task description. (Show Details)
Eevans added a project: RESTBase-Cassandra.

I have done some testing with g1gc in staging and now on restbase1004. It's looking promising so far; at least it seems to let us survive the current overload a bit better than CMS.

Settings:

  • commented out "GC tuning options" section
  • instead, added JVM_OPTS="$JVM_OPTS -XX:+UseG1GC -XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=500"
  • MAX_HEAP_SIZE="14g"
  • commented out JVM_OPTS="$JVM_OPTS -Xmn${HEAP_NEWSIZE}"; we should not set new gen size for g1gc

Change 221993 had a related patch set uploaded (by GWicke):
Move Cassandra to g1gc collector and increase heap size

https://gerrit.wikimedia.org/r/221993

Change 221993 merged by Filippo Giunchedi:
Move Cassandra to g1gc collector and increase heap size

https://gerrit.wikimedia.org/r/221993

restbase1004 has been running with a 16GB heap size and MaxGCPauseMillis=250 for two days now, and from what I can tell, there is no significant impact on I/O wait times. Moreover, it's the only node that hasn't been restarted during this period, despite being the node with the biggest amount of storage to handle.

fgiunchedi triaged this task as Medium priority.Jul 20 2015, 2:52 PM

we're running g1gc everywhere