|operations/puppet||production||+15 -1||Refine needs Hive 1.1.0 jars from CDH 5.10.0 to work around jar version conflict|
|analytics/refinery||master||+22 -0||Add Hive 1.1.0 jars from CDH 5.10.0|
When ALTERing Hive tables, DataFrameToHive uses a manual JDBC connection to Hive,
rather than Spark SQL. This is a work around for https://issues.apache.org/jira/browse/SPARK-23890.
Spark doesn't allow issuing of ALTER statements via spark.sql().
When we upgraded to CDH 5.15.0 (in November 2018), this manual JDBC connection stopped working.
Even though the Hive version hasn't changed from CDH 5.10.0 to CDH 5.15.0, there seems to
be some backported fix that now causes org/apache/hadoop/hive/common/auth/HiveAuthUtils.class to be loaded.
HiveAuthUtils.class is in /usr/lib/hive/lib/hive-common.jar in Hive 1.1.0. However,
adding hive-common.jar from Hive 1.1.0 causes it to be used over Spark's own builtin version
of hive-common (1.2.1). Spark references some HiveConf properties that are not present in
Hive 1.1.0, which results in java.lang.NoSuchFieldError: METASTORE_CLIENT_SOCKET_LIFETIME.
Our workaround to this NEW problem with our workaround is to use the Hive
jars from CDH 5.10.0 to create our manual JDBC connections from DataFrameToHive, so that hive-common.jar is not needed.
The real fix to this problem would be to fix SPARK-23890 to allow harmless ALTER
statements via spark.sql() so we can stop using a manual Hive JDBC connection.
18/11/13 20:52:34 INFO RefineMonitor: No dataset targets in /wmf/data/raw/eventlogging between 2018-11-11T20:50:26.600Z and 2018-11-13T16:50:26.601Z need refinement to /wmf/data/event