upgrade to latest openjdk 8 8u66-b01-1
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• GWicke
	Jul 6 2015, 6:04 PM

Description

JDK8 has significant performance and stability improvements, in particular around the g1gc collector we are using with Cassandra for its capability to deal with larger heaps. For this reason, we have started to evaluate jdk8 on restbase1004 and, since today, on restbase1005.

This is using an older backport package from sid which is still in our repository. For long-term production use this would need to be updated, which is tracked in T104887.

Related Objects

Mentioned In: T107949: upgrade RESTBase cluster to Cassandra 2.1.8
T104887: Update JDK 8 package in backports repo
T103161: consider moving Cassandra to G1GC in production
Mentioned Here: T104887: Update JDK 8 package in backports repo

Event Timeline

• GWicke created this task.Jul 6 2015, 6:04 PM

• GWicke raised the priority of this task from to Needs Triage.

• GWicke updated the task description. (Show Details)

• GWicke added projects: RESTBase, acl*sre-team.

• GWicke subscribed.

Restricted Application added subscribers: Matanya, Aklapper. · View Herald TranscriptJul 6 2015, 6:04 PM

• GWicke set Security to None.Jul 6 2015, 6:05 PM

• GWicke added subscribers: fgiunchedi, faidon, Joe and 2 others.

It turns out that restbase1004 was actually downgraded to jdk7 on Friday (please log such changes!). This seems to have negatively affected its stability, with lots of restarts over the weekend.

@mobrovac mentioned that the reason for the downgrade was a change in JVM metrics. While important, I think that not being down or failing a high percentage of requests is more important for now than some metrics not working. We are currently deleting data in the hope of improving the situation, but need the cluster to compact away those deletions first before we can catch some breath.

As latencies and timeout rates seem to have improved since switching 1004 and 1005, I went ahead and switched 1001 to jdk8 as well. This means that 1/2 the cluster and all the largest (by storage size and thus load) instances are running jdk8.

• GWicke mentioned this in T103161: consider moving Cassandra to G1GC in production.Jul 6 2015, 10:27 PM

I did see 1005 OOM twice in quick succession earlier. It has been running fine since.

restbase1004 also OOMd at 11.32 UTC, running jdk8 too

I think overall it looks like jdk8 might be helping a little bit, but it's not making a huge difference to OOMs and memory pressure situations. Those seem to be primarily driven by mutations backing up, so not necessarily something a GC can do much about directly.

ok, I think it makes sense to reduce the variables at play and run openjdk 7 everywhere

Since we are using G1GC I'd actually vote for using JDK8 everywhere, as that's considered less mature in JDK7.

that might be true, is there any evidence to suggest jdk7 vs jdk8 instances are doing better?

I don't have any conclusive evidence either way. All benchmarks I have seen with cassandra and G1GC show better throughput using jdk8.

I have now switched the remaining three nodes to JDK8 in order to see if this reduces timeouts further.

fgiunchedi mentioned this in T104887: Update JDK 8 package in backports repo.Jul 13 2015, 2:05 PM

"I don't recommend anyone try G1 on JDK 7 < u75 or JDK 8 < u40 (although it's probably OK down to u20 according to the docs I've read). I did some testing on JDK7u75 and it was stable but didn't spend much time on it since JDK8u40 gave a nice bump in performance (5-10% on a customer cluster) by just switching JDKs and nothing else."

Jessie has jdk7u79.

As we discussed in the Ops meeting yesterday, please revert all nodes back to stable/maintained JDK7 so we can get a good baseline while things are stable, and can do limited testing with OpenJDK 8 in production and on separate nodes where appropriate.

mark added a project: Blocked-on-Services.Jul 14 2015, 3:02 PM

• mobrovac edited projects, added RESTBase-Cassandra; removed RESTBase.Aug 3 2015, 6:09 PM

See T104887 for a discussion of what happened after downgrading to JDK7.

the cassandra test cluster has been upgraded to latest openjdk-8 8u66-b01-1~bpo8+1

fgiunchedi mentioned this in T107949: upgrade RESTBase cluster to Cassandra 2.1.8.Aug 7 2015, 8:41 AM