EventLogging Hive Refine broken after upgrade to CDH 5.15.0
Closed, ResolvedPublic5 Story Points

Ottomata created this task.Nov 13 2018, 7:18 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 13 2018, 7:18 PM

Change 473268 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/refinery@master] Add Hive 1.1.0 jars from CDH 5.10.0

https://gerrit.wikimedia.org/r/473268

Ottomata added a comment.EditedNov 13 2018, 7:19 PM

Ok:

When ALTERing Hive tables, DataFrameToHive uses a manual JDBC connection to Hive,
rather than Spark SQL. This is a work around for https://issues.apache.org/jira/browse/SPARK-23890.
Spark doesn't allow issuing of ALTER statements via spark.sql().

When we upgraded to CDH 5.15.0 (in November 2018), this manual JDBC connection stopped working.
Even though the Hive version hasn't changed from CDH 5.10.0 to CDH 5.15.0, there seems to
be some backported fix that now causes org/apache/hadoop/hive/common/auth/HiveAuthUtils.class to be loaded.
HiveAuthUtils.class is in /usr/lib/hive/lib/hive-common.jar in Hive 1.1.0. However,
adding hive-common.jar from Hive 1.1.0 causes it to be used over Spark's own builtin version
of hive-common (1.2.1). Spark references some HiveConf properties that are not present in
Hive 1.1.0, which results in java.lang.NoSuchFieldError: METASTORE_CLIENT_SOCKET_LIFETIME.

Our workaround to this NEW problem with our workaround is to use the Hive
jars from CDH 5.10.0 to create our manual JDBC connections from DataFrameToHive, so that hive-common.jar is not needed.

The real fix to this problem would be to fix SPARK-23890 to allow harmless ALTER
statements via spark.sql() so we can stop using a manual Hive JDBC connection.

Change 473268 merged by Ottomata:
[analytics/refinery@master] Add Hive 1.1.0 jars from CDH 5.10.0

https://gerrit.wikimedia.org/r/473268

Change 473271 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Refine needs to use Hive 1.1.0 jars from CDH 5.10.0 to work around jar version conflict

https://gerrit.wikimedia.org/r/473271

Mentioned in SAL (#wikimedia-operations) [2018-11-13T19:43:51Z] <otto@deploy1001> Started deploy [analytics/refinery@62d6f4b]: Deploy hive jars from CDH 5.10.0 to workaround Refine bug: T209407

Mentioned in SAL (#wikimedia-operations) [2018-11-13T19:49:48Z] <otto@deploy1001> Finished deploy [analytics/refinery@62d6f4b]: Deploy hive jars from CDH 5.10.0 to workaround Refine bug: T209407 (duration: 05m 57s)

Change 473271 merged by Ottomata:
[operations/puppet@production] Refine needs Hive 1.1.0 jars from CDH 5.10.0 to work around jar version conflict

https://gerrit.wikimedia.org/r/473271

Ottomata moved this task from Next Up to Done on the Analytics-Kanban board.Nov 13 2018, 8:57 PM
Ottomata set the point value for this task to 5.

Woot!

18/11/13 20:52:34 INFO RefineMonitor: No dataset targets in /wmf/data/raw/eventlogging between 2018-11-11T20:50:26.600Z and 2018-11-13T16:50:26.601Z need refinement to /wmf/data/event
elukey triaged this task as High priority.
Nuria closed this task as Resolved.Mon, Nov 19, 11:25 PM