Hive metastore should be listening for thrift connections on port 9083. When pointing airflow at an-coord1001.eqiad.wmnet:9083 it errors out with connection refused, as dns returned an ipv6 address but metastore only appears to be listening on ipv4. For the moment i've hardcoded an-coord1001's ipv4 address into the airflow config, but ideally hive should listen on both.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | odimitrijevic | T240437 Analytics Ops Technical Debt | |||
Resolved | elukey | T240255 an-coord1001 hive metastore not listening on ipv6 |
Event Timeline
Interesting, I have never realized this. The hive daemons are running with -Djava.net.preferIPv4Stack=true, probably similar to all the other hadoop daemons (see T225296#5295016). We can try to set -Djava.net.preferIPv4Stack=false in Hadoop testing and see how it goes.
I doubt we want to prefer IPv6, (do we?) but maybe we can make Hive listen on both IPs?
Change 556198 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::analytics_test_cluster::coordinator: use IPv6 in Hive
Change 556198 merged by Elukey:
[operations/puppet@production] role::analytics_test_cluster::coordinator: use IPv6 in Hive
Very interesting:
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xmx256m -Xms4g -Xmx10g -Xms4g -Xmx10g -Djava.net.preferIPv4Stack=false -Dcom.sun.management.jmxremote.port=9979 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dhive.log.dir=/var/log/hive -Dhive.log.file=hive-metastore.log -Dhive.log.threshold=INFO -Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/etc/hive/conf.analytics-test-hadoop/java-logging.properties -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/lib/hive/lib/hive-service-1.1.0-cdh5.16.1.jar org.apache.hadoop.hive.metastore.HiveMetaStore
As you can see adding -Djava.net.preferIPv4Stack=false in hive-env.sh didn't add it as last, so it gets overridden by java.net.preferIPv4Stack=true eventually. I was expecting to find the =true occurrence in /usr/lib/hive, but I was wrong. Will need to do some more research :)
Ok I know what happens, this is the chain of events:
- /etc/init.d/hive-metastore eventually calls /usr/lib/hive/bin/ext/metastore.sh
- the file contains
export HADOOP_OPTS="$HIVE_METASTORE_HADOOP_OPTS $HADOOP_OPTS" exec $HADOOP jar $JAR $CLASS "$@"
- The $HADOOP var is /usr/lib/hadoop/bin/hadoop that calls (eventually) /usr/lib/hadoop/libexec/hadoop-config.sh that contains
# Disable ipv6 as it can cause issues HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
For the other Hadoop daemons I circumvented the issue simply adding the java.net.preferIPv4Stack=false, that was appended at the end overriding. In the hive case, java.net.preferIPv4Stack=true is added before the Hadoop one. I'll try to find a trick in puppet to make the /usr/lib/hadoop/libexec/hadoop-config.sh line commented, there is nothing Hive specific against ipv6.
Change 556337 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] cdh::hadoop: remove ipv6 constraints
Change 556337 merged by Elukey:
[operations/puppet@production] cdh::hadoop: remove ipv6 constraints
Change 556633 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] cdh::hadoop: replace augeas with a file resource
Change 556633 merged by Elukey:
[operations/puppet@production] cdh::hadoop: replace augeas with a file resource
Change 556641 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] hadoop: remove ipv6 constraint workaround
Change 556641 merged by Elukey:
[operations/puppet@production] hadoop: remove ipv6 constraint workaround
Looks better now!
elukey@stat1004:~$ telnet an-coord1001.eqiad.wmnet 9083 Trying 2620:0:861:105:10:64:21:104... Connected to an-coord1001.eqiad.wmnet. Escape character is '^]'.
@EBernhardson can you re-check and confirm the fix?
Mentioned in SAL (#wikimedia-analytics) [2019-12-12T12:59:25Z] <elukey> roll restart hadoop workers to pick up the new settings (removed prefer ipv4 false after T240255)
Change 583631 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] cdh::hadoop: allow hadoop daemons to override ipv6 settings
Change 583631 merged by Elukey:
[operations/puppet@production] cdh::hadoop: allow hadoop daemons to override ipv6 settings