Page MenuHomePhabricator

CDH 5.10 upgrade
Closed, ResolvedPublic21 Estimated Story Points

Description

Debian Jessie vs CDH upgrade plan

  1. Upgrade whole cluster to CDH 5.10 as is.
  2. Get new Hadoop nodes (T152713), install those as Debian Jessie with CDH 5.10
  3. Incrementally reinstall current cluster nodes as Debian Jessie.

Event Timeline

Hadoop Nodes to Jessie? (CDH has released Jessie debs) +1 (maybe with CDH upgrade?).

In Analytics Ops meeting today, we decided we should upgrade to CDH 5.10 now that it is out, even though it doesn't have Spark 2.x like we had hoped.

  • Mediawiki History reconstruction can use Spark 1.6
  • The NodeManager mem leak bug is fixed in newer CDH
  • We can upgrade to Debian.

This will also be good to time with the order of new Hadoop nodes, and to get done before we replace stat1002/stat1003, so we can install those as Jessie too.

Previous CDH 5.5 upgrade task: T119646

etherpad process for that upgrade: https://etherpad.wikimedia.org/p/analytics-cdh5.5
etherpad for this one: https://etherpad.wikimedia.org/p/analytics-cdh5.10

Debian Jessie vs CDH upgrade plan

  1. Upgrade whole cluster to CDH 5.10 as is.
  2. Get new Hadoop nodes, install those as Debian Jessie with CDH 5.10
  3. Incrementally reinstall current cluster nodes as Debian Jessie.

Testing steps include loading data on labs & upgrade & testing refinery jobs before starting cluster migration

Milimetric triaged this task as Medium priority.
Milimetric updated the task description. (Show Details)
Milimetric set the point value for this task to 21.
Milimetric edited projects, added Analytics-Kanban; removed Analytics.
Ottomata renamed this task from CDH upgrade. Value proposition: new spark for edit reconstruction to CDH 5.10 upgrade.Feb 2 2017, 10:40 PM

Did the upgrade in labs today:

  • Went smoothly.
  • Except I broke Hue. I think this was not caused by the upgrade though. Will investigate more.
  • I was able to run a Jessie worker node on CDH 5.10 alongside all of the Trusty ones.
  • A webrequest refine job worked fine both before and after the upgrade.

Ah ha! Hue did break because of a change. Had to do: https://gerrit.wikimedia.org/r/#/c/336906/1/templates/hue/hue.ini.erb

So, with that, everything looks good! Time to schedule...

So, we briefly talked about doing this on a weekend..buut I don't really have a free weekend day until March 4. I suppose this can wait that long. Thoughts?

I think it can wait, the advantage of doing it in a weekend would be less hassle for ourselves and users, but if you prefer to do it during a weekday thet would be fine too.

OOok, let's do this on March 4th then! Will send email.

Wait , that is the same weekend than the visualization hackathon, correct?

Current plan: do this February Tues 28th. I will send out announcement and schedule downtime.

Post upgrade manually: remove cdh5.5* packages from apt thirdparty. Use grep-dctrl line from reprepro updates to figure out what to remove:

grep-dctrl -e -S '^zookeeper$|^hadoop$|^hadoop-0.20-mapreduce$|^bigtop-jsvc$|^bigtop-utils$|^sqoop$|^hbase$|^pig$|^pig-udf-datafu$|^hive$|^oozie$|^hue$|^bigtop-tomcat$|^spark$|^avro-libs$|^parquet$|^parquet-format$|^spark-core$|^spark-history-server$|^spark-master$|^spark-python$|^spark-worker$|^mahout$|^kite$|^solr$|^sentry$|^impala$|^impala-catalog$|^impala-server$|^impala-shell$|^impa