Page MenuHomePhabricator

Upgrade Analytics Cluster to Java 8
Closed, ResolvedPublic21 Story Points

Description

The Druid 0.10 upgrade is blocked until we can upgrade the Analytics cluster to Java 8. Doing so will require a LOT of testing, and is a pretty big task :/

Details

Related Gerrit Patches:
operations/puppet : productionrole::archiva: move to java 8
operations/puppet : productionprofile::hive: remove jmx trans and java 7 support
operations/puppet : productionprofile::hadoop::worker: remove jmxtrans support
operations/puppet : productionstatistics::packages: deploy only java 8
operations/puppet : productionprofile::java::analytics: deploy only java 8
operations/puppet : productionForce JAVA_HOME to openjdk-8's jre for all Hadoop daemons
operations/puppet : productionprofile::hadoop::worker: install spark2 package after Hive config
operations/puppet : productionAllow to set the JAVA_HOME env variable in hadoop/hive/oozie
operations/puppet : productionUpdate the cdh module to the latest sha
operations/puppet/cdh : masterAllow to explicitly set the JAVA_HOME environment variable
operations/puppet : productionUse hadoop cluster name variable in camus templates
operations/puppet : productionprofile::analytics::database::meta: add ferm rules hiera parameter
operations/puppet : productionprofile::hive/oozie: add a hiera parameter for the ferm srange
operations/puppet : productionParametersize kafka_cluster_name in refinery job camus
operations/puppet : productionprofile::analytics::database::meta: simplify labs deployment
operations/puppet : productionprofile::hadoop::backup::namenode: improve labs support
operations/puppet : productionprofile::analytics::database::meta::backup_dest: allow labs dir perms
operations/puppet : productionprofile::hadoop:*: add ferm srange defaults to allow labs deployments
operations/puppet : productionprofile::hadoop::firewall::master: fix default ferm srange

Event Timeline

Ottomata created this task.May 24 2017, 6:40 PM
Nuria lowered the priority of this task from High to Medium.Jun 8 2017, 4:00 PM
Nuria moved this task from Operational Excellence Future to Backlog (Later) on the Analytics board.
Nuria added a comment.Jun 26 2017, 4:25 PM

@Paladox: cluster upgrades and jenkins upgrades are really not related, removing subtask

Ottomata reassigned this task from Ottomata to elukey.Nov 22 2017, 3:52 PM
elukey added a comment.EditedJan 2 2018, 2:18 PM

Some weeks ago we discussed this task during the analytics ops meeting. We shouldn't aim to test all the jobs but only the refine ones in labs after executing the upgrade procedure.

Cloudera offers some guidance: https://www.cloudera.com/documentation/enterprise/5-10-x/topics/cdh_cm_upgrading_to_jdk8.html

We meet all the requirements in the Warning section, so in theory the procedure should be as simple as:

  1. stop all the daemons in the analytics cluster (after announcing the maintenance window to everybody)
  2. upgrade the cluster to openjdk8
  3. restart the cluster
elukey added a comment.EditedJan 2 2018, 2:47 PM

The java upgrade part should be something like the following executed on all the analytics hadoop hosts:

  1. apt-get update && apt-get install openjdk-8-jdk openjdk-8-jre openjdk-8-jre-headless

1-a) It will probably need a puppet run to modify profile::java::analytics and include openjdk-8

  1. sudo /usr/sbin/update-java-alternatives -s java-1.8.0-openjdk-amd64
elukey moved this task from Next Up to In Progress on the Analytics-Kanban board.Jan 2 2018, 3:58 PM
dcausse rescinded a token.

Change 402324 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] network::constants: add fake analytics networks for labs

https://gerrit.wikimedia.org/r/402324

Change 402324 merged by Elukey:
[operations/puppet@production] profile::hadoop:*: add ferm srange defaults to allow labs deployments

https://gerrit.wikimedia.org/r/402324

Change 402354 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::hadoop::firewall::master: fix default ferm srange

https://gerrit.wikimedia.org/r/402354

Change 402354 merged by Elukey:
[operations/puppet@production] profile::hadoop::firewall::master: fix default ferm srange

https://gerrit.wikimedia.org/r/402354

Change 402382 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::analytics::database::meta::backup_dest: allow labs dir perms

https://gerrit.wikimedia.org/r/402382

Change 402382 merged by Elukey:
[operations/puppet@production] profile::analytics::database::meta::backup_dest: allow labs dir perms

https://gerrit.wikimedia.org/r/402382

Change 402783 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::hadoop::backup::namenode: improve labs support

https://gerrit.wikimedia.org/r/402783

Change 402783 merged by Elukey:
[operations/puppet@production] profile::hadoop::backup::namenode: improve labs support

https://gerrit.wikimedia.org/r/402783

Change 402791 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::analytics::database::meta: simplify labs deployment

https://gerrit.wikimedia.org/r/402791

Change 402791 merged by Elukey:
[operations/puppet@production] profile::analytics::database::meta: simplify labs deployment

https://gerrit.wikimedia.org/r/402791

Change 402847 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Parametersize kafka_cluster_name in refinery job camus

https://gerrit.wikimedia.org/r/402847

Change 402847 merged by Ottomata:
[operations/puppet@production] Parametersize kafka_cluster_name in refinery job camus

https://gerrit.wikimedia.org/r/402847

Change 403128 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::hive/oozie: add a hiera parameter for the ferm srange

https://gerrit.wikimedia.org/r/403128

Change 403128 merged by Elukey:
[operations/puppet@production] profile::hive/oozie: add a hiera parameter for the ferm srange

https://gerrit.wikimedia.org/r/403128

Change 403131 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::analytics::database::meta: add ferm rules hiera parameter

https://gerrit.wikimedia.org/r/403131

Change 403131 merged by Elukey:
[operations/puppet@production] profile::analytics::database::meta: add ferm rules hiera parameter

https://gerrit.wikimedia.org/r/403131

Change 403206 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use hadoop cluster name variable in camus templates

https://gerrit.wikimedia.org/r/403206

Change 403206 merged by Ottomata:
[operations/puppet@production] Use hadoop cluster name variable in camus templates

https://gerrit.wikimedia.org/r/403206

Change 403701 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet/cdh@master] Allow to explicitly set the JAVA_HOME environment variable

https://gerrit.wikimedia.org/r/403701

Tested in labs the procedure outlined above (install + update-java-alternatives to java8) and everything went fine. The following errors are ok (double checked with Moritz):

update-alternatives: error: no alternatives for mozilla-javaplugin.so
update-java-alternatives: plugin alternative does not exist: /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/IcedTeaPlugin.so

The main problem is that in the init.d scripts of the hadoop daemons, if JAVA_HOME is not set then it is auto-looked up via /usr/lib/bigtop-utils/bigtop-detect-javahome, that seems to favor java7 over java8.

/usr/lib/bigtop-utils/bigtop-detect-javahome, that seems to favor java7 over java8.

Strange that it favors Java 7 even if update-java-alternatives chooses Java 8. Hm.

it does the following (maybe I am reading the code in the wrong way):

# Note that the JDK versions recommended for production use in CDH
# may not reflect the current recommendations for Apache Bigtop
case ${BIGTOP_JAVA_MAJOR} in
    6) JAVA_HOME_CANDIDATES=(${JAVA6_HOME_CANDIDATES[@]})
    ;;
    7) JAVA_HOME_CANDIDATES=(${JAVA7_HOME_CANDIDATES[@]} ${OPENJAVA7_HOME_CANDIDATES[@]})
    ;;
    8) JAVA_HOME_CANDIDATES=(${JAVA8_HOME_CANDIDATES[@]} ${OPENJAVA8_HOME_CANDIDATES[@]})
    ;;
    *) JAVA_HOME_CANDIDATES=(${JAVA7_HOME_CANDIDATES[@]}
                             ${JAVA8_HOME_CANDIDATES[@]}
                             ${MISCJAVA_HOME_CANDIDATES[@]}
                             ${OPENJAVA7_HOME_CANDIDATES[@]}
                             ${OPENJAVA8_HOME_CANDIDATES[@]})
    ;;
esac

# attempt to find java
if [ -z "${JAVA_HOME}" ]; then
    for candidate_regex in ${JAVA_HOME_CANDIDATES[@]}; do
        for candidate in `ls -rvd ${candidate_regex}* 2>/dev/null`; do
            if [ -e ${candidate}/bin/java ]; then
                export JAVA_HOME=${candidate}
                break 2
            fi
        done
    done
fi

Change 403701 merged by Elukey:
[operations/puppet/cdh@master] Allow to explicitly set the JAVA_HOME environment variable

https://gerrit.wikimedia.org/r/403701

Change 404685 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Update the cdh module to the latest sha

https://gerrit.wikimedia.org/r/404685

Change 404685 merged by Elukey:
[operations/puppet@production] Update the cdh module to the latest sha

https://gerrit.wikimedia.org/r/404685

Change 404954 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Allow to set the JAVA_HOME env variable in hadoop/hive/oozie

https://gerrit.wikimedia.org/r/404954

Change 404954 merged by Elukey:
[operations/puppet@production] Allow to set the JAVA_HOME env variable in hadoop/hive/oozie

https://gerrit.wikimedia.org/r/404954

Change 405263 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::hadoop::worker: install spark2 package after Hive config

https://gerrit.wikimedia.org/r/405263

Change 405263 merged by Elukey:
[operations/puppet@production] profile::hadoop::worker: install spark2 package after Hive config

https://gerrit.wikimedia.org/r/405263

For Spark I believe that update-java-alternatives is enough to force it to pick up java8. I found this in one of the scripts called by spark(-2)*-shell:

# Find the java binary
if [ -n "${JAVA_HOME}" ]; then
  RUNNER="${JAVA_HOME}/bin/java"
else
  if [ "$(command -v java)" ]; then
    RUNNER="java"
  else
    echo "JAVA_HOME is not set" >&2
    exit 1
  fi
fi

And command -v java returns /usr/bin/java, that is a symlink to /etc/java/alternatives (that points to whatever java version is set).

Wow nice etherpad plan, <3

Andrew and Joseph completed a test in labs to verify that Druid running on Java 7 would still work fine with Hadoop running java 8, and no surprises came up.

We are thinking to schedule the upgrade for Feb 6th.

Change 408251 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Force JAVA_HOME to openjdk-8's jre for all Hadoop daemons

https://gerrit.wikimedia.org/r/408251

Mentioned in SAL (#wikimedia-operations) [2018-02-13T18:25:08Z] <elukey> Analytics Hadoop cluster upgrade to Java 8 about to start - complete cluster shutdown is needed - T166248

Change 408251 merged by Elukey:
[operations/puppet@production] Force JAVA_HOME to openjdk-8's jre for all Hadoop daemons

https://gerrit.wikimedia.org/r/408251

Change 410244 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::java::analytics: deploy only java 8

https://gerrit.wikimedia.org/r/410244

Change 410244 merged by Elukey:
[operations/puppet@production] profile::java::analytics: deploy only java 8

https://gerrit.wikimedia.org/r/410244

Change 410250 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] statistics::packages: deploy only java 8

https://gerrit.wikimedia.org/r/410250

Change 410250 merged by Elukey:
[operations/puppet@production] statistics::packages: deploy only java 8

https://gerrit.wikimedia.org/r/410250

Change 410396 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::hadoop::worker: remove jmxtrans support

https://gerrit.wikimedia.org/r/410396

Change 410396 merged by Elukey:
[operations/puppet@production] profile::hadoop::worker: remove jmxtrans support

https://gerrit.wikimedia.org/r/410396

Change 410445 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::archiva: move to java 8

https://gerrit.wikimedia.org/r/410445

elukey moved this task from Ready to Deploy to Done on the Analytics-Kanban board.Feb 14 2018, 5:00 PM
elukey set the point value for this task to 21.Feb 14 2018, 5:11 PM

Cluster upgraded to java8 and java 7 packages removed from all analytics hosts except analytics1003 due to T184794 (jmxtrans depends on java 7).

Nuria closed this task as Resolved.Feb 14 2018, 10:20 PM

Change 410680 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::hive: remove jmx trans and java 7 support

https://gerrit.wikimedia.org/r/410680

Change 410680 merged by Elukey:
[operations/puppet@production] profile::hive: remove jmx trans and java 7 support

https://gerrit.wikimedia.org/r/410680

Cluster upgraded to java8 and java 7 packages removed from all analytics hosts except analytics1003 due to T184794 (jmxtrans depends on java 7).

Not true anymore, I had to removed everything since some cron scripts were using hive, that in turn was auto-detecting java 7 rather than java 8.

Change 410445 abandoned by Elukey:
role::archiva: move to java 8

https://gerrit.wikimedia.org/r/410445