Page MenuHomePhabricator

Make tranquility work with Spark
Closed, ResolvedPublic13 Estimated Story Points

Description

In order, one of the following:

  1. Implement tranquility in spark 1.6 - Failing because of dependencie issues.
  2. Implement tranquility in spark 2.1.1 - This is the one !!

Keeping the other 2 options for the record:

  1. Ask FangJin for help
  2. Productionise tranquility service unsing Kafka

Event Timeline

Hm! I think we may be able to push realtime stats to druid with the Tranquility API. It's got a Spark interface:

https://github.com/druid-io/tranquility/blob/master/docs/spark.md

Change 370865 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Allow ANALYTICS_NETWORKS to talk to druid zookeeper cluster

https://gerrit.wikimedia.org/r/370865

Change 370865 merged by Ottomata:
[operations/puppet@production] Allow ANALYTICS_NETWORKS to talk to druid zookeeper cluster

https://gerrit.wikimedia.org/r/370865

Yargh. I keep getting stuck on Jackson databind dependency issues. This is the same issue that druid has when running Hadoop indexing task, that somehow we've gotten over by setting -Dhadoop.mapreduce.job.user.classpath.first=true in the middlemanager runtime.properties. This seems to work ok when it is just Druid and Hadoop doing stuff, but adding Spark into the mix makes things even more difficult. I get

java.lang.VerifyError: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;

I believe this is because spark executors are auto loading .jars out of various Hadoop class paths, e.g. /usr/lib/hadoop-mapreduce/jackson-databind-2.2.3.jar, and those are taking precidence over the newer jackson-databind 2.4.6 version that Druid and also refinery-job need. I've tried all kinds of ways (spark.{driver,executor}.extraJavaOptions, spark.{driver,executor}.userClassPathFirst, etc.) of getting Spark to use 2.4.6, but without much luck.

Anyway, I'll park the modified version of Joseph's BannerImpressionsStream job here for now. Maybe Joseph and I can figure this out together some other time.

https://gist.github.com/ottomata/5020480e15eae6fe80c54e2aa4c80b78

JAllemandou renamed this task from Productionize Tranquility (or shut it off) to Try to make tranquility work with Spark.Aug 28 2017, 3:52 PM
JAllemandou claimed this task.
JAllemandou edited projects, added Analytics-Kanban; removed Analytics.
JAllemandou updated the task description. (Show Details)
JAllemandou set the point value for this task to 21.
JAllemandou renamed this task from Try to make tranquility work with Spark to Make tranquility work with Spark.Aug 30 2017, 7:25 PM
JAllemandou moved this task from In Progress to In Code Review on the Analytics-Kanban board.
JAllemandou updated the task description. (Show Details)
JAllemandou changed the point value for this task from 21 to 13.Aug 31 2017, 4:12 PM

Change 375762 had a related patch set uploaded (by Joal; owner: Joal):
[operations/puppet@production] Update pivot config with explicit dimensions

https://gerrit.wikimedia.org/r/375762

Change 375762 had a related patch set uploaded (by Joal; owner: Joal):
[operations/puppet@production] Update pivot config with explicit dimensions

https://gerrit.wikimedia.org/r/375762

Change 375762 merged by Elukey:
[operations/puppet@production] Update pivot config with explicit dimensions

https://gerrit.wikimedia.org/r/375762