Page MenuHomePhabricator

Install Debian Buster on Hadoop
Open, Stalled, MediumPublic0 Estimated Story Points

Description

The upgrade to Debian buster for the Hadoop cluster(s) might be a bit more complicated than what we thought, due to the fact that openjdk-8 is not available on Debian Buster. In T229347 Andrew was able to install it on stat1005 since the openjdk-8 was present in Buster before its final release, but not now (so if we reimage we'll not find it for example).

The above becomes problematic due to the following constraints:

  1. Spark 2.3 (our current version) doesn't support Java 11 (see also T229347#5394326). IIUC this is due to the Scala version used (2.11), that doesn't support Java 11 (https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html)
  2. Support of scala 2.12+ for Java 11 is still incomplete - https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html#jdk-11-compatibility-notes
  3. Spark 2.4 comes with scala 2.12 that offers experimental support for Java 11

Also, in stretch-backports we do have openjdk-11: https://packages.debian.org/stretch-backports/openjdk-11-jdk
Last but not the least, we'd also need to make sure that the HDFS/Yarn daemons work correctly on Buster and Java 11. CDH of course supports Java11 only from 6.3 onward: https://www.cloudera.com/documentation/enterprise/upgrade/topics/ug_jdk8.html

But it also true that CDH 6.3 ships with Spark 2.4, so either they support Java 11 as experimental feature or there is a way to make Spark 2.4 working: https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_63_packaging.html

Considerations:

  • I am not a scala/spark expert so what I wrote above might not be true, please double check and in case correct me :)
  • backporting openjdk-8 to buster is possible but it would require a big effort for the SRE team. The last backport of openjdk-8 for cassandra on Debian Jessie still needs to be maintained (application of patches for Debian Security Advisories, etc..), so it would be preferable not to go on that road again.

Event Timeline

elukey triaged this task as Medium priority.Aug 23 2019, 9:31 AM
elukey created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 23 2019, 9:31 AM

Spark 2.4 comes with scala 2.12 that offers experimental support for Java 11

I don't see this one! https://github.com/apache/spark/blob/v2.4.3/pom.xml#L158 has scala 2.11.12 which according to https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html is compatible with Java 11!

refinery-source has scala 2.11.7, but I'd expect an upgrade to 2.11.12 would be manageable.

I believe Java 8 will also be compatible with 2.11.12, so if I can make Spark 2.4.3 in T222253 work, then I think we could upgrade to Spark 2.4.3 before we do Buster everywhere.

Confirmed that Spark 2.4.3 works with Java 8. I think we can and should upgrade to Spark 2.4.3 before we switch to Java 11 and Buster.

As far as I can tell, Spark 2.4.3 also works with Java 11. I can't test that in YARN mode since we don't have Java 11 on the cluster.

I suggest we install Java 11 from stretch-backports everywhere before we upgrade to Buster. Then we test the upgrade in the analytics_test cluster, and then switch to Java 11 in prod analytics cluster. This will make it easier to roll back if we have to.

CDH of course supports Java11 only from 6.3 onward: https://www.cloudera.com/documentation/enterprise/upgrade/topics/ug_jdk8.html

Of, if that's true, this could be a problem. We'd have to upgrade to CDH 6.3 first?

elukey added a comment.Sep 9 2019, 5:20 PM

In my mind, there are three major things that we'd need to do for Hadoop:

  • Complete the work on Kerberos, roll out the new config and handle the fallout of problems that we didn't test/take into account. Even if we did a ton of testing, there will be a lot of people to train and use cases to fix, requiring a lot of time.
  • Test Hadoop 3 in the test cluster, and see if it can work with our code (refinery) and if not, what changes would need to be made. It seems easy but it will require a lot of time and efforts.
  • Migrate to buster, that means also migrating to Java 11. Another long task that will require extensive testing and resources from the Analytics team.

The last two steps must also take into account that testing will not involve only the Hadoop code, but all the dependent systems (Druid, Notebooks, Spark, etc..) and also our own code (most notably, the Analytics refinery with our jobs etc..).

Due to the current size of our team, I can see only two of the above goals doable, three is overcommitting in my opinion. Even if I'd love to test Java 11 as soon as possible, a realist plan in my opinion could be:

  1. Port, if possible, openjdk-8 to wikimedia-buster and establish how much effort is needed by SRE to maintain the package(s). A co-ownership with Analytics could also be possible to split the pain. I offer myself volunteer to maintain the openjdk-8 package(s) if needed (with supervision :).
  2. Unblock the Buster migration of the Hadoop cluster, and at the same time allow the testing of Hadoop 3 (CDH6) on the testing cluster.

Eventually, when the above goals (and Kerberos) will be done, we'll be able to decide/plan for Java 11 (likely next FY in my opinion).

Nuria added a comment.Sep 9 2019, 5:35 PM

Agreed with @elukey and priority wise I think we cannot test any hadoop upgrades until we have rolled out kerberos

@MoritzMuehlenhoff what do you think about our plan? Could it be enough to consider backporting openjdk-8 to buster? The value added would not only be related to Hadoop, but to all java-based systems like Kafka/Druid/Zookeeper/etc.. that will be able to migrate to Buster without testing Java 11 first (non trivial step sadly).

For Kafka I can see https://issues.apache.org/jira/browse/KAFKA-7264, that was fixed for 2.1.0. In theory we could migrate to Buster this year with openjdk-8, and then think about the Kafka 2.x upgrade for next fiscal + java 11. Same path for other systems..

So, let me summarise to make I got this correctly. We have the following two options:

  1. Upgrade to CDH 6.3 on Stretch which provides Hadoop and Scala supporting both Java 8 and 11 and then reimage each server from "CDH 6.3/Stretch" to "CDH 6.3/Buster"
  2. Build Java 8 for Buster and install the current CDH 5 packages on Buster (do we know if they are supported on Buster, though?) and then migrate from "CDH 5 + Java 8 /Buster" to "CDH 6.3+Java 11/Buster" later

Is that correct? With the additional contraint that we want to run the GPU-stuff which is Buster-only as stat host with Hadoop access, right?

Then 2. is the only feasible option and let's do it. We might run into similar migration issues with Elastic as well, so maybe that work also useful on a wider scale.

I have to note though that these temporary things always tend to stick around; I built Java 8 for jessie something like four years ago as a Cassandra performance enhancement for Restbase (which used Java 7 at the time) and to this date we still have to keep it update :-)

A tricky part about Java upgrades is that from what I can tell, any inter JVM process communication seems to fail between different java versions. So, Hadoop <-> Hadoop stuff will fail if the processes are running different JVM versions. This means that we have to do a full cluster downtime to upgrade to Java 11. I'm not 100% sure this is always true, just something I've noticed from trying. I'm not sure if this is true of Kafka. If it is...I'm not sure what we are gonna do! :)

So, let me summarise to make I got this correctly. We have the following two options:

  1. Upgrade to CDH 6.3 on Stretch which provides Hadoop and Scala supporting both Java 8 and 11 and then reimage each server from "CDH 6.3/Stretch" to "CDH 6.3/Buster"

Yes correct, with two caveats: 1) as Andrew mentioned, all hosts will need to be migrated at once 2) We (as Analytics) have to test and port a ton of code that uses core Hadoop function to the new major version.

  1. Build Java 8 for Buster and install the current CDH 5 packages on Buster (do we know if they are supported on Buster, though?) and then migrate from "CDH 5 + Java 8 /Buster" to "CDH 6.3+Java 11/Buster" later

We don't know yet, my plan was to start testing a Buster node on the Testing cluster as soon as the Kerberos work is in a good state. In theory it shouldn't be a problem, but in practice we'll need time to test. If Java 8 is available on Buster we'll be able to also convert a couple of Hadoop Analytics nodes (not testing I mean) and observe them for a couple of weeks to spot anomalies (and in case attempt to fix them or just rollback everything, worst that can happen is to have some job failed).

Is that correct? With the additional contraint that we want to run the GPU-stuff which is Buster-only as stat host with Hadoop access, right?

Yep!

Then 2. is the only feasible option and let's do it. We might run into similar migration issues with Elastic as well, so maybe that work also useful on a wider scale.

I have also another first candidate for Java 8 on Buster, namely the new Zookeeper Analytics nodes T217057. Zookeeper clients use Java libraries to contact the cluster (as opposed to use a more agnostic protocol like HTTP for example), so running Java 11 on the servers and 8 on the clients might end up in serialization issues (same that Andrew mentioned).

I have to note though that these temporary things always tend to stick around; I built Java 8 for jessie something like four years ago as a Cassandra performance enhancement for Restbase (which used Java 7 at the time) and to this date we still have to keep it update :-)

I completely agree and my team is committed to test Hadoop 3 + Java 11 as soon as possible. To share the pain I also offered myself to help in maintaining java 8 on buster if needed :)

T233604 tracks the work to import the openjdk-8 package to a special component for Debian Buster, thanks Moritz!

Change 538844 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::java::analytics: deploy openjdk-8 on Buster

https://gerrit.wikimedia.org/r/538844

Change 538844 merged by Elukey:
[operations/puppet@production] profile::java::analytics: deploy openjdk-8 on Buster

https://gerrit.wikimedia.org/r/538844

T214364 has to be taken into consideration since it lists the missing dependencies that we had to create for CDH on stretch.

elukey added a comment.Jan 3 2020, 2:52 PM

Given the problem of Java 8 vs 11 has been resolved, I'd say that we could concentrate on the Hadoop workers for the moment, leaving aside other corner cases like Hue (that can stay on Stretch for more time, there is no real rush).

Change 561869 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Set Buster for analytics1031

https://gerrit.wikimedia.org/r/561869

Change 561869 merged by Elukey:
[operations/puppet@production] Set Buster for analytics1031

https://gerrit.wikimedia.org/r/561869

elukey added a comment.Jan 3 2020, 4:55 PM

I forgot that libssl1.0.0 is also a dependency of hadoop-* packages, following up in T214364 to see how to solve the problem.

elukey changed the task status from Open to Stalled.Feb 18 2020, 2:36 PM

The current idea is to move to BigTop first (on Stretch) and then wait for the upcoming 1.5 release that should natively support Buster.

Marking this task as stalled until T244499 is completed.