Page MenuHomePhabricator

CDH 5.10 upgrade
Closed, ResolvedPublic21 Story Points

Description

Debian Jessie vs CDH upgrade plan

  1. Upgrade whole cluster to CDH 5.10 as is.
  2. Get new Hadoop nodes (T152713), install those as Debian Jessie with CDH 5.10
  3. Incrementally reinstall current cluster nodes as Debian Jessie.

Event Timeline

Nuria created this task.Dec 8 2016, 6:57 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 8 2016, 6:57 PM
Nuria added a comment.Dec 8 2016, 6:58 PM

Hadoop Nodes to Jessie? (CDH has released Jessie debs) +1 (maybe with CDH upgrade?).

In Analytics Ops meeting today, we decided we should upgrade to CDH 5.10 now that it is out, even though it doesn't have Spark 2.x like we had hoped.

  • Mediawiki History reconstruction can use Spark 1.6
  • The NodeManager mem leak bug is fixed in newer CDH
  • We can upgrade to Debian.

This will also be good to time with the order of new Hadoop nodes, and to get done before we replace stat1002/stat1003, so we can install those as Jessie too.

Ottomata added a comment.EditedFeb 2 2017, 3:07 PM

Previous CDH 5.5 upgrade task: T119646

etherpad process for that upgrade: https://etherpad.wikimedia.org/p/analytics-cdh5.5
etherpad for this one: https://etherpad.wikimedia.org/p/analytics-cdh5.10

Debian Jessie vs CDH upgrade plan

  1. Upgrade whole cluster to CDH 5.10 as is.
  2. Get new Hadoop nodes, install those as Debian Jessie with CDH 5.10
  3. Incrementally reinstall current cluster nodes as Debian Jessie.
Nuria added a comment.Feb 2 2017, 4:40 PM

Testing steps include loading data on labs & upgrade & testing refinery jobs before starting cluster migration

Milimetric triaged this task as Normal priority.
Milimetric updated the task description. (Show Details)
Milimetric set the point value for this task to 21.
Milimetric edited projects, added Analytics-Kanban; removed Analytics.
Ottomata renamed this task from CDH upgrade. Value proposition: new spark for edit reconstruction to CDH 5.10 upgrade.Feb 2 2017, 10:40 PM

Did the upgrade in labs today:

  • Went smoothly.
  • Except I broke Hue. I think this was not caused by the upgrade though. Will investigate more.
  • I was able to run a Jessie worker node on CDH 5.10 alongside all of the Trusty ones.
  • A webrequest refine job worked fine both before and after the upgrade.
Ottomata moved this task from Next Up to In Progress on the Analytics-Kanban board.Feb 6 2017, 3:45 PM
elukey added a subscriber: elukey.Feb 9 2017, 4:02 PM

Ah ha! Hue did break because of a change. Had to do: https://gerrit.wikimedia.org/r/#/c/336906/1/templates/hue/hue.ini.erb

So, with that, everything looks good! Time to schedule...

So, we briefly talked about doing this on a weekend..buut I don't really have a free weekend day until March 4. I suppose this can wait that long. Thoughts?

Nuria added a comment.Feb 10 2017, 1:23 AM

I think it can wait, the advantage of doing it in a weekend would be less hassle for ourselves and users, but if you prefer to do it during a weekday thet would be fine too.

OOok, let's do this on March 4th then! Will send email.

Nuria added a comment.Feb 10 2017, 3:54 PM

Wait , that is the same weekend than the visualization hackathon, correct?

Current plan: do this February Tues 28th. I will send out announcement and schedule downtime.

Ottomata added a comment.EditedFeb 15 2017, 3:37 PM

Post upgrade manually: remove cdh5.5* packages from apt thirdparty. Use grep-dctrl line from reprepro updates to figure out what to remove:

grep-dctrl -e -S '^zookeeper$|^hadoop$|^hadoop-0.20-mapreduce$|^bigtop-jsvc$|^bigtop-utils$|^sqoop$|^hbase$|^pig$|^pig-udf-datafu$|^hive$|^oozie$|^hue$|^bigtop-tomcat$|^spark$|^avro-libs$|^parquet$|^parquet-format$|^spark-core$|^spark-history-server$|^spark-master$|^spark-python$|^spark-worker$|^mahout$|^kite$|^solr$|^sentry$|^impala$|^impala-catalog$|^impala-server$|^impala-shell$|^impa
mpopov added a subscriber: mpopov.Feb 24 2017, 8:59 PM
Ottomata moved this task from In Progress to Done on the Analytics-Kanban board.Mar 1 2017, 3:44 PM
Nuria closed this task as Resolved.Mar 8 2017, 8:00 PM