Make Spark 2.1 easily available on new CDH5.10 cluster
Closed, ResolvedPublic8 Estimated Story Points
Actions

Assigned To

Authored By

	JAllemandou
	Feb 16 2017, 5:38 PM

Description

Spark 2+ is a real improvelent over 1.6, it'd be great if we could have it available, and gently move our jobs to the new APIs.

Loose end TODOs:

remove spark2-beeline
spark-sql logging is too verbose with provided log4j.properties
Make spark2 use hadoop native libs
Make a spark2 assembly jar and put hdfs
Wikitech documentation
email announcement

Details

Subject	Repo	Branch	Lines +/-
Install spark2 on Hadoop workers for use with Oozie	operations/puppet	production	+3 -0
2.1.2-2 release for Hadoop 2.6	operations/debs/spark2	debian	+100 -9
Install Spark 2 for Hadoop clients	operations/puppet	production	+3 -2
Initial debian release (2.1.2-bin-hadoop2.6-1)	operations/debs/spark2	debian	+566 -0

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		JAllemandou	T168550 Make tranquility work with Spark
		Resolved		Ottomata	T158334 Make Spark 2.1 easily available on new CDH5.10 cluster

Event Timeline

JAllemandou created this task.Feb 16 2017, 5:38 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 16 2017, 5:38 PM

+1, i betcha we could just load the jars into hdfs and have a special wrapper script to use them. MAYBE. :)

elukey subscribed.Feb 16 2017, 5:57 PM

Milimetric moved this task from Incoming to Wikistats on the Analytics board.Feb 23 2017, 4:38 PM

JAllemandou moved this task from Wikistats to Operational Excellence Future on the Analytics board.Mar 1 2017, 4:45 PM

• Nuria moved this task from Operational Excellence Future to Wikistats on the Analytics board.Mar 16 2017, 5:13 PM

Will help us solve oozie-hive issues with HiveContext (currently we are working around those)

• Nuria moved this task from Wikistats to Dashiki on the Analytics board.May 29 2017, 3:52 PM

Bumped this cloudera forum thread: https://community.cloudera.com/t5/Beta-Releases-RecordService/Spark-2-1-official-beta/m-p/49476

We have work ongoing in T162912 that was initially built against 2.1 with the eventual intent of productization. It uses MLlib, which had several breaking changes from 1.6 to 2.0. I am currently unsure of the exact impact of having to back-port the already-written code.

All versions< 2.2 are affected by security issue, that will be also part of the value of upgrading

Ideally we will get this upgrade with the new cloudera distribution

• Nuria moved this task from Dashiki to Backlog (Later) on the Analytics board.Jul 13 2017, 4:08 PM

Discussed in standup 2017-08-31: Let's use scap to deploy spark-2.1.1 release folder (with small changes in config for logging and hadoop-conf setting) on stat100[345] and analytics1003 (for prod jobs).

JAllemandou moved this task from Backlog (Later) to Operational Excellence Future on the Analytics board.Aug 31 2017, 3:32 PM

• Nuria added a parent task: T168550: Make tranquility work with Spark.Oct 5 2017, 4:20 PM

• fdans set the point value for this task to 8.Oct 5 2017, 4:30 PM

• fdans moved this task from Operational Excellence Future to Backlog (Later) on the Analytics board.

• fdans mentioned this in T139487: Get 'sparklyr' working on stats1005.Oct 5 2017, 4:43 PM

Ottomata claimed this task.Oct 31 2017, 8:16 PM

Ottomata edited projects, added Analytics-Kanban, Analytics-Clusters; removed Analytics.

Change 387663 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/debs/spark2@debian] Initial debian release (2.1.2-bin-hadoop2.6-1)

https://gerrit.wikimedia.org/r/387663

gerritbot added a project: Patch-For-Review.Oct 31 2017, 8:49 PM

Change 387680 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Install Spark 2 in Hadoop Cluster

https://gerrit.wikimedia.org/r/387680

Change 387663 merged by Ottomata:
[operations/debs/spark2@debian] Initial debian release (2.1.2-bin-hadoop2.6-1)

https://gerrit.wikimedia.org/r/387663

Change 387680 merged by Ottomata:
[operations/puppet@production] Install Spark 2 for Hadoop clients

https://gerrit.wikimedia.org/r/387680

OOooOOO boy!

[@stat1005:/home/otto] $ ls /usr/bin/*spark2* | cat
/usr/bin/pyspark2
/usr/bin/spark2-beeline
/usr/bin/spark2R
/usr/bin/spark2-shell
/usr/bin/spark2-sql
/usr/bin/spark2-submit

I betcha there will be other things that pop up. But, I think I did it!

Ottomata moved this task from Next Up to In Code Review on the Analytics-Kanban board.Nov 1 2017, 3:05 PM

Remaining TODOs:

remove spark2-beeline(?)
spark-sql logging is too verbose with provided log4j.properties
Make spark2 use hadoop native libs(?)
Make a spark2 assembly jar and put hdfs
Wikitech documentation
email announcement

EBernhardson subscribed.Nov 6 2017, 8:59 PM

• Nuria moved this task from In Code Review to Paused on the Analytics-Kanban board.Nov 7 2017, 4:06 PM

Change 390435 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/debs/spark2@debian] 2.1.2-2 release for Hadoop 2.6

https://gerrit.wikimedia.org/r/390435

Change 390435 merged by Ottomata:
[operations/debs/spark2@debian] 2.1.2-2 release for Hadoop 2.6

https://gerrit.wikimedia.org/r/390435

Ottomata moved this task from Paused to Done on the Analytics-Kanban board.Nov 10 2017, 6:01 PM

Ottomata moved this task from Done to In Code Review on the Analytics-Kanban board.

Ottomata updated the task description. (Show Details)

tested SparkR with Spark 2.1.2: T139487#3752443

Ottomata moved this task from In Code Review to Done on the Analytics-Kanban board.Nov 13 2017, 4:00 PM

Change 391028 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Install spark2 on Hadoop workers for use with Oozie

https://gerrit.wikimedia.org/r/391028

Ottomata updated the task description. (Show Details)Nov 13 2017, 4:11 PM

Change 391028 merged by Ottomata:
[operations/puppet@production] Install spark2 on Hadoop workers for use with Oozie

https://gerrit.wikimedia.org/r/391028

• Nuria closed this task as Resolved.Nov 27 2017, 9:28 PM

Make Spark 2.1 easily available on new CDH5.10 clusterClosed, ResolvedPublic8 Estimated Story PointsActions

Description

Details

Related ObjectsSearch...

Event Timeline

Make Spark 2.1 easily available on new CDH5.10 cluster
Closed, ResolvedPublic8 Estimated Story Points
Actions

Related Objects
Search...