Page MenuHomePhabricator

Update JDK 8 package in backports repo
Closed, ResolvedPublic

Authored By
GWicke
Jul 6 2015, 6:03 PM
Referenced Files
F342426: pasted_file
Aug 3 2015, 3:39 PM
F342439: pasted_file
Aug 3 2015, 3:39 PM
F292109: screenshot_NEEiOy.png
Jul 31 2015, 5:28 PM
F291856: pasted_file
Jul 31 2015, 2:45 PM
F291830: pasted_file
Jul 31 2015, 2:36 PM
F288370: pasted_file
Jul 30 2015, 11:50 PM
F288241: pasted_file
Jul 30 2015, 11:29 PM
F288218: pasted_file
Jul 30 2015, 11:27 PM

Description

Our current jdk8 backport packages do not have current security updates & fixes applied. They were created for Titan testing, and have not since been touched.

We are now considering to use jdk8 for Cassandra in general, but before we can do so we'll need an up-to-date jdk8 package. The next version of Cassandra will depend on jdk8, and the general recommendation is to run current versions with it as well.

Event Timeline

GWicke raised the priority of this task from to Needs Triage.
GWicke updated the task description. (Show Details)
GWicke added a project: RESTBase.

We don't really need this, do we?

mobrovac subscribed.

We don't really need this, do we?

I think we do. We are currently running on OpenJDK8 and it seemed to provide more stability for Cassandra during the previous two weeks. Moreover, JDK8 has got better support for G1GC, which we are using in production. Also, as mentioned in the description, the next Cassandra version will depend on it (tangential, but still relevant, IMHO).

Do we have real evidence OpenJDK 8 makes a measurable difference to Casssandra? A properly OpenJDK 8 would require additional effort/maintenance on our side, while we can stick with Debian defaults for OpenJDK 7.

For a period of time during the Cassandra semi-outages, we switched half of the nodes to OpenJDK8 and they appeared to be more stable than the others (in spite of the fact that the busiest nodes were switched). Also, when these nodes would crash nevertheless, it was usually because of time-outs caused by long GC collection times. The nodes on JDK7 would die more frequently and it was usually death-by-GC caused by GC's inability to free enough memory. This suggests that JDK8 does indeed have better support for G1GC than JDK7.

reporting from irc, @MoritzMuehlenhoff has updated openjdk 7 packages, downgrading on Thurs

Interestingly, https://packages.debian.org/search?suite=default&section=all&arch=any&searchon=names&keywords=+openjdk-8-jre lists jessie-backports having 8u45. Is that what we have been using / planned to use, or would using the backports package reduce the maintenance / update burden?

If we move to openjdk-8 at a later point we would likely make our own backport and work with the Debian Java maintainers towards providing our build in jessie-backport.

We cannot simply rely on backports.debian.org, since it is updated on a best-effort basis by maintainers and backports.debian.org has the rule that you can only upload backports once the package has migrated to testing (and OpenJDK is often broken on exotic archs, which blocks testing migration)

We downgraded this morning. So far it looks like G1GC new gen collection times went up significantly:

pasted_file (1×1 px, 521 KB)

pasted_file (1×1 px, 405 KB)

pasted_file (1×1 px, 224 KB)

There is a bootstrap going on, but that was also the case earlier in the week. We'll see if the difference persists when the bootstrap is done.

Any other ideas on what could cause this?

C* read latency went up quite a bit as well:

pasted_file (1×1 px, 547 KB)

The bootstrap has now finished, but GC times and latencies are still a lot higher:

pasted_file (1×1 px, 406 KB)

pasted_file (1×1 px, 420 KB)

However, it does not seem to affect all nodes equally. Three out of eight nodes are showing reasonable GC behavior.

indeed, some nodes are still showing high GC, restbase1005 had cassandra restarted to attempt to replicate the behaviour but I think it'll take some hours anyway. p50 latencies have increased on the restbase side as well, we're going to switch back to openjdk8 and expect a correspondent recovery

screenshot_NEEiOy.png (355×838 px, 39 KB)

Since switching back to JDK8, GC timings and latencies have been back to normal:

pasted_file (1×1 px, 298 KB)

pasted_file (1×1 px, 200 KB)

An openjdk-8 is running, it will also be uploaded to jessie-backports. Filippo and I will take care of keeping it updated there for the quarterly security releases by Oracle.

The latest release has been backported as openjdk-8_8u66-b01-1~bpo8+1 and was uploaded to jessie-wikimedia (also available in jessie-backports on ftp.debian.org)