Hive metastore should be listening for thrift connections on port 9083. When pointing airflow at an-coord1001.eqiad.wmnet:9083 it errors out with connection refused, as dns returned an ipv6 address but metastore only appears to be listening on ipv4. For the moment i've hardcoded an-coord1001's ipv4 address into the airflow config, but ideally hive should listen on both.
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xmx256m -Xms4g -Xmx10g -Xms4g -Xmx10g -Djava.net.preferIPv4Stack=false -Dcom.sun.management.jmxremote.port=9979 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dhive.log.dir=/var/log/hive -Dhive.log.file=hive-metastore.log -Dhive.log.threshold=INFO -Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/etc/hive/conf.analytics-test-hadoop/java-logging.properties -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/lib/hive/lib/hive-service-1.1.0-cdh5.16.1.jar org.apache.hadoop.hive.metastore.HiveMetaStore
As you can see adding -Djava.net.preferIPv4Stack=false in hive-env.sh didn't add it as last, so it gets overridden by java.net.preferIPv4Stack=true eventually. I was expecting to find the =true occurrence in /usr/lib/hive, but I was wrong. Will need to do some more research :)
Ok I know what happens, this is the chain of events:
- /etc/init.d/hive-metastore eventually calls /usr/lib/hive/bin/ext/metastore.sh
- the file contains
export HADOOP_OPTS="$HIVE_METASTORE_HADOOP_OPTS $HADOOP_OPTS" exec $HADOOP jar $JAR $CLASS "$@"
- The $HADOOP var is /usr/lib/hadoop/bin/hadoop that calls (eventually) /usr/lib/hadoop/libexec/hadoop-config.sh that contains
# Disable ipv6 as it can cause issues HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
For the other Hadoop daemons I circumvented the issue simply adding the java.net.preferIPv4Stack=false, that was appended at the end overriding. In the hive case, java.net.preferIPv4Stack=true is added before the Hadoop one. I'll try to find a trick in puppet to make the /usr/lib/hadoop/libexec/hadoop-config.sh line commented, there is nothing Hive specific against ipv6.