Page MenuHomePhabricator

upgrade to latest openjdk 8 8u66-b01-1
Closed, ResolvedPublic

Description

JDK8 has significant performance and stability improvements, in particular around the g1gc collector we are using with Cassandra for its capability to deal with larger heaps. For this reason, we have started to evaluate jdk8 on restbase1004 and, since today, on restbase1005.

This is using an older backport package from sid which is still in our repository. For long-term production use this would need to be updated, which is tracked in T104887.

Event Timeline

GWicke raised the priority of this task from to Needs Triage.
GWicke updated the task description. (Show Details)
GWicke added projects: RESTBase, acl*sre-team.
GWicke subscribed.

It turns out that restbase1004 was actually downgraded to jdk7 on Friday (please log such changes!). This seems to have negatively affected its stability, with lots of restarts over the weekend.

@mobrovac mentioned that the reason for the downgrade was a change in JVM metrics. While important, I think that not being down or failing a high percentage of requests is more important for now than some metrics not working. We are currently deleting data in the hope of improving the situation, but need the cluster to compact away those deletions first before we can catch some breath.

As latencies and timeout rates seem to have improved since switching 1004 and 1005, I went ahead and switched 1001 to jdk8 as well. This means that 1/2 the cluster and all the largest (by storage size and thus load) instances are running jdk8.

I did see 1005 OOM twice in quick succession earlier. It has been running fine since.

restbase1004 also OOMd at 11.32 UTC, running jdk8 too

I think overall it looks like jdk8 might be helping a little bit, but it's not making a huge difference to OOMs and memory pressure situations. Those seem to be primarily driven by mutations backing up, so not necessarily something a GC can do much about directly.

ok, I think it makes sense to reduce the variables at play and run openjdk 7 everywhere

Since we are using G1GC I'd actually vote for using JDK8 everywhere, as that's considered less mature in JDK7.

that might be true, is there any evidence to suggest jdk7 vs jdk8 instances are doing better?

I don't have any conclusive evidence either way. All benchmarks I have seen with cassandra and G1GC show better throughput using jdk8.

I have now switched the remaining three nodes to JDK8 in order to see if this reduces timeouts further.

See also: https://issues.apache.org/jira/browse/CASSANDRA-7486

"I don't recommend anyone try G1 on JDK 7 < u75 or JDK 8 < u40 (although it's probably OK down to u20 according to the docs I've read). I did some testing on JDK7u75 and it was stable but didn't spend much time on it since JDK8u40 gave a nice bump in performance (5-10% on a customer cluster) by just switching JDKs and nothing else."

Jessie has jdk7u79.

As we discussed in the Ops meeting yesterday, please revert all nodes back to stable/maintained JDK7 so we can get a good baseline while things are stable, and can do limited testing with OpenJDK 8 in production and on separate nodes where appropriate.

See T104887 for a discussion of what happened after downgrading to JDK7.

the cassandra test cluster has been upgraded to latest openjdk-8 8u66-b01-1~bpo8+1

MoritzMuehlenhoff subscribed.

Filippo, you'd been doing that, so I'm assigning you the ticket?

@MoritzMuehlenhoff sure! I'll be rolling upgrade cassandra machines this week to latest openjdk

fgiunchedi renamed this task from Test JDK8 with Cassandra to upgrade to latest openjdk 8 8u66-b01-1.Aug 18 2015, 8:16 AM

production cluster has been upgraded to 8u66-b01-1~bpo8+1