Make tranquility work with Spark
Closed, ResolvedPublic13 Estimated Story Points
Actions

Assigned To

Authored By

	Ottomata
	Jun 21 2017, 5:58 PM

Description

In order, one of the following:

Implement tranquility in spark 1.6 - Failing because of dependencie issues.
Implement tranquility in spark 2.1.1 - This is the one !!

Keeping the other 2 options for the record:

Ask FangJin for help
Productionise tranquility service unsing Kafka

Details

	Subject	Repo	Branch	Lines +/-
	Update pivot config with explicit dimensions	operations/puppet	production	+113 -1
	Allow ANALYTICS_NETWORKS to talk to druid zookeeper cluster	operations/puppet	production	+1 -1

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		JAllemandou	T168550 Make tranquility work with Spark
		Resolved		Ottomata	T158334 Make Spark 2.1 easily available on new CDH5.10 cluster

Event Timeline

Ottomata created this task.Jun 21 2017, 5:58 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 21 2017, 5:58 PM

For debianization, we could model off what we did for druid:

https://github.com/wikimedia/operations-debs-druid/blob/debian/debian/README.Debian

JAllemandou awarded a token.Jun 21 2017, 7:46 PM

• Nuria moved this task from Incoming to Dashiki on the Analytics board.Jun 26 2017, 3:53 PM

• Nuria triaged this task as Low priority.Jul 10 2017, 3:55 PM

Hm! I think we may be able to push realtime stats to druid with the Tranquility API. It's got a Spark interface:

https://github.com/druid-io/tranquility/blob/master/docs/spark.md

Change 370865 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Allow ANALYTICS_NETWORKS to talk to druid zookeeper cluster

https://gerrit.wikimedia.org/r/370865

gerritbot added a project: Patch-For-Review.Aug 9 2017, 6:59 PM

Change 370865 merged by Ottomata:
[operations/puppet@production] Allow ANALYTICS_NETWORKS to talk to druid zookeeper cluster

https://gerrit.wikimedia.org/r/370865

Yargh. I keep getting stuck on Jackson databind dependency issues. This is the same issue that druid has when running Hadoop indexing task, that somehow we've gotten over by setting -Dhadoop.mapreduce.job.user.classpath.first=true in the middlemanager runtime.properties. This seems to work ok when it is just Druid and Hadoop doing stuff, but adding Spark into the mix makes things even more difficult. I get

java.lang.VerifyError: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;

I believe this is because spark executors are auto loading .jars out of various Hadoop class paths, e.g. /usr/lib/hadoop-mapreduce/jackson-databind-2.2.3.jar, and those are taking precidence over the newer jackson-databind 2.4.6 version that Druid and also refinery-job need. I've tried all kinds of ways (spark.{driver,executor}.extraJavaOptions, spark.{driver,executor}.userClassPathFirst, etc.) of getting Spark to use 2.4.6, but without much luck.

Anyway, I'll park the modified version of Joseph's BannerImpressionsStream job here for now. Maybe Joseph and I can figure this out together some other time.

https://gist.github.com/ottomata/5020480e15eae6fe80c54e2aa4c80b78

JAllemandou renamed this task from Productionize Tranquility (or shut it off) to Try to make tranquility work with Spark.Aug 28 2017, 3:52 PM

JAllemandou claimed this task.

JAllemandou edited projects, added Analytics-Kanban; removed Analytics.

JAllemandou updated the task description. (Show Details)

JAllemandou set the point value for this task to 21.

JAllemandou moved this task from Next Up to In Progress on the Analytics-Kanban board.Aug 29 2017, 8:47 AM

JAllemandou renamed this task from Try to make tranquility work with Spark to Make tranquility work with Spark.Aug 30 2017, 7:25 PM

JAllemandou moved this task from In Progress to In Code Review on the Analytics-Kanban board.

JAllemandou updated the task description. (Show Details)

JAllemandou updated the task description. (Show Details)Aug 30 2017, 7:27 PM

Ottomata awarded a token.Aug 31 2017, 3:06 PM

JAllemandou changed the point value for this task from 21 to 13.Aug 31 2017, 4:12 PM

JAllemandou merged a task: T169101: Make banner realtime jobs more resilient.Aug 31 2017, 4:16 PM

Change 375762 had a related patch set uploaded (by Joal; owner: Joal):
[operations/puppet@production] Update pivot config with explicit dimensions

https://gerrit.wikimedia.org/r/375762

Change 375762 had a related patch set uploaded (by Joal; owner: Joal):
[operations/puppet@production] Update pivot config with explicit dimensions

https://gerrit.wikimedia.org/r/375762

Change 375762 merged by Elukey:
[operations/puppet@production] Update pivot config with explicit dimensions

https://gerrit.wikimedia.org/r/375762

• Nuria added a subtask: T158334: Make Spark 2.1 easily available on new CDH5.10 cluster.Oct 5 2017, 4:20 PM

• Nuria closed subtask T158334: Make Spark 2.1 easily available on new CDH5.10 cluster as Resolved.Nov 27 2017, 9:28 PM

JAllemandou moved this task from In Code Review to Done on the Analytics-Kanban board.Jan 11 2018, 11:43 AM

• Nuria closed this task as Resolved.Feb 12 2018, 3:55 PM

Make tranquility work with SparkClosed, ResolvedPublic13 Estimated Story PointsActions

Description

Details

Related ObjectsSearch...

Event Timeline

Make tranquility work with Spark
Closed, ResolvedPublic13 Estimated Story Points
Actions

Related Objects
Search...