Page MenuHomePhabricator

Make Spark 2.1 easily available on new CDH5.10 cluster
Closed, ResolvedPublic8 Story Points

Description

Spark 2+ is a real improvelent over 1.6, it'd be great if we could have it available, and gently move our jobs to the new APIs.

Loose end TODOs:

  • remove spark2-beeline
  • spark-sql logging is too verbose with provided log4j.properties
  • Make spark2 use hadoop native libs
  • Make a spark2 assembly jar and put hdfs
  • Wikitech documentation
  • email announcement

Details

Related Gerrit Patches:
operations/puppet : productionInstall spark2 on Hadoop workers for use with Oozie
operations/debs/spark2 : debian2.1.2-2 release for Hadoop 2.6
operations/puppet : productionInstall Spark 2 for Hadoop clients
operations/debs/spark2 : debianInitial debian release (2.1.2-bin-hadoop2.6-1)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 16 2017, 5:38 PM

+1, i betcha we could just load the jars into hdfs and have a special wrapper script to use them. MAYBE. :)

elukey added a subscriber: elukey.Feb 16 2017, 5:57 PM
Nuria triaged this task as Medium priority.Mar 20 2017, 4:07 PM
Nuria added a subscriber: Nuria.

Will help us solve oozie-hive issues with HiveContext (currently we are working around those)

Nuria moved this task from Wikistats Production to Dashiki on the Analytics board.May 29 2017, 3:52 PM

We have work ongoing in T162912 that was initially built against 2.1 with the eventual intent of productization. It uses MLlib, which had several breaking changes from 1.6 to 2.0. I am currently unsure of the exact impact of having to back-port the already-written code.

Nuria added a comment.Jul 13 2017, 4:07 PM

All versions< 2.2 are affected by security issue, that will be also part of the value of upgrading

Nuria added a comment.Jul 13 2017, 4:08 PM

Ideally we will get this upgrade with the new cloudera distribution

Nuria moved this task from Dashiki to Backlog (Later) on the Analytics board.Jul 13 2017, 4:08 PM

Discussed in standup 2017-08-31: Let's use scap to deploy spark-2.1.1 release folder (with small changes in config for logging and hadoop-conf setting) on stat100[345] and analytics1003 (for prod jobs).

fdans set the point value for this task to 8.Oct 5 2017, 4:30 PM
fdans moved this task from Operational Excellence Future to Backlog (Later) on the Analytics board.
Ottomata claimed this task.Oct 31 2017, 8:16 PM
Ottomata edited projects, added Analytics-Kanban, Analytics-Cluster; removed Analytics.

Change 387663 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/debs/spark2@debian] Initial debian release (2.1.2-bin-hadoop2.6-1)

https://gerrit.wikimedia.org/r/387663

Change 387680 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Install Spark 2 in Hadoop Cluster

https://gerrit.wikimedia.org/r/387680

Change 387663 merged by Ottomata:
[operations/debs/spark2@debian] Initial debian release (2.1.2-bin-hadoop2.6-1)

https://gerrit.wikimedia.org/r/387663

Change 387680 merged by Ottomata:
[operations/puppet@production] Install Spark 2 for Hadoop clients

https://gerrit.wikimedia.org/r/387680

OOooOOO boy!

[@stat1005:/home/otto] $ ls /usr/bin/*spark2* | cat
/usr/bin/pyspark2
/usr/bin/spark2-beeline
/usr/bin/spark2R
/usr/bin/spark2-shell
/usr/bin/spark2-sql
/usr/bin/spark2-submit

I betcha there will be other things that pop up. But, I think I did it!

Remaining TODOs:

  • remove spark2-beeline(?)
  • spark-sql logging is too verbose with provided log4j.properties
  • Make spark2 use hadoop native libs(?)
  • Make a spark2 assembly jar and put hdfs
  • Wikitech documentation
  • email announcement
Nuria moved this task from In Code Review to Paused on the Analytics-Kanban board.Nov 7 2017, 4:06 PM

Change 390435 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/debs/spark2@debian] 2.1.2-2 release for Hadoop 2.6

https://gerrit.wikimedia.org/r/390435

Change 390435 merged by Ottomata:
[operations/debs/spark2@debian] 2.1.2-2 release for Hadoop 2.6

https://gerrit.wikimedia.org/r/390435

Ottomata moved this task from Paused to Done on the Analytics-Kanban board.Nov 10 2017, 6:01 PM
Ottomata moved this task from Done to In Code Review on the Analytics-Kanban board.
Ottomata updated the task description. (Show Details)

Change 391028 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Install spark2 on Hadoop workers for use with Oozie

https://gerrit.wikimedia.org/r/391028

Ottomata updated the task description. (Show Details)Nov 13 2017, 4:11 PM

Change 391028 merged by Ottomata:
[operations/puppet@production] Install spark2 on Hadoop workers for use with Oozie

https://gerrit.wikimedia.org/r/391028

Nuria closed this task as Resolved.Nov 27 2017, 9:28 PM