Page MenuHomePhabricator

an-coord1001 hive metastore not listening on ipv6
Closed, ResolvedPublic5 Estimated Story Points

Description

Hive metastore should be listening for thrift connections on port 9083. When pointing airflow at an-coord1001.eqiad.wmnet:9083 it errors out with connection refused, as dns returned an ipv6 address but metastore only appears to be listening on ipv4. For the moment i've hardcoded an-coord1001's ipv4 address into the airflow config, but ideally hive should listen on both.

Event Timeline

Restricted Application added a project: Analytics. · View Herald TranscriptDec 9 2019, 7:47 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
elukey added a subscriber: elukey.Dec 10 2019, 2:38 PM

Interesting, I have never realized this. The hive daemons are running with -Djava.net.preferIPv4Stack=true, probably similar to all the other hadoop daemons (see T225296#5295016). We can try to set -Djava.net.preferIPv4Stack=false in Hadoop testing and see how it goes.

I doubt we want to prefer IPv6, (do we?) but maybe we can make Hive listen on both IPs?

Change 556198 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::analytics_test_cluster::coordinator: use IPv6 in Hive

https://gerrit.wikimedia.org/r/556198

Change 556198 merged by Elukey:
[operations/puppet@production] role::analytics_test_cluster::coordinator: use IPv6 in Hive

https://gerrit.wikimedia.org/r/556198

Very interesting:

/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xmx256m -Xms4g -Xmx10g -Xms4g -Xmx10g -Djava.net.preferIPv4Stack=false -Dcom.sun.management.jmxremote.port=9979 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dhive.log.dir=/var/log/hive -Dhive.log.file=hive-metastore.log -Dhive.log.threshold=INFO -Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/etc/hive/conf.analytics-test-hadoop/java-logging.properties -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/lib/hive/lib/hive-service-1.1.0-cdh5.16.1.jar org.apache.hadoop.hive.metastore.HiveMetaStore

As you can see adding -Djava.net.preferIPv4Stack=false in hive-env.sh didn't add it as last, so it gets overridden by java.net.preferIPv4Stack=true eventually. I was expecting to find the =true occurrence in /usr/lib/hive, but I was wrong. Will need to do some more research :)

Ok I know what happens, this is the chain of events:

  1. /etc/init.d/hive-metastore eventually calls /usr/lib/hive/bin/ext/metastore.sh
  2. the file contains
export HADOOP_OPTS="$HIVE_METASTORE_HADOOP_OPTS $HADOOP_OPTS"
exec $HADOOP jar $JAR $CLASS "$@"
  1. The $HADOOP var is /usr/lib/hadoop/bin/hadoop that calls (eventually) /usr/lib/hadoop/libexec/hadoop-config.sh that contains
# Disable ipv6 as it can cause issues
HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

For the other Hadoop daemons I circumvented the issue simply adding the java.net.preferIPv4Stack=false, that was appended at the end overriding. In the hive case, java.net.preferIPv4Stack=true is added before the Hadoop one. I'll try to find a trick in puppet to make the /usr/lib/hadoop/libexec/hadoop-config.sh line commented, there is nothing Hive specific against ipv6.

Change 556337 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] cdh::hadoop: remove ipv6 constraints

https://gerrit.wikimedia.org/r/556337

elukey moved this task from Next Up to In Code Review on the Analytics-Kanban board.

Change 556337 merged by Elukey:
[operations/puppet@production] cdh::hadoop: remove ipv6 constraints

https://gerrit.wikimedia.org/r/556337

Change 556633 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] cdh::hadoop: replace augeas with a file resource

https://gerrit.wikimedia.org/r/556633

Change 556633 merged by Elukey:
[operations/puppet@production] cdh::hadoop: replace augeas with a file resource

https://gerrit.wikimedia.org/r/556633

Change 556641 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] hadoop: remove ipv6 constraint workaround

https://gerrit.wikimedia.org/r/556641

Change 556641 merged by Elukey:
[operations/puppet@production] hadoop: remove ipv6 constraint workaround

https://gerrit.wikimedia.org/r/556641

Looks better now!

elukey@stat1004:~$ telnet an-coord1001.eqiad.wmnet 9083
Trying 2620:0:861:105:10:64:21:104...
Connected to an-coord1001.eqiad.wmnet.
Escape character is '^]'.

@EBernhardson can you re-check and confirm the fix?

Mentioned in SAL (#wikimedia-analytics) [2019-12-12T12:59:25Z] <elukey> roll restart hadoop workers to pick up the new settings (removed prefer ipv4 false after T240255)

elukey claimed this task.Dec 12 2019, 5:04 PM
elukey triaged this task as Medium priority.
elukey set the point value for this task to 5.Dec 13 2019, 7:21 AM
elukey moved this task from In Code Review to Done on the Analytics-Kanban board.
elukey closed this task as Resolved.Jan 30 2020, 12:47 AM

Change 583631 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] cdh::hadoop: allow hadoop daemons to override ipv6 settings

https://gerrit.wikimedia.org/r/583631

Change 583631 merged by Elukey:
[operations/puppet@production] cdh::hadoop: allow hadoop daemons to override ipv6 settings

https://gerrit.wikimedia.org/r/583631

Aklapper removed a project: Analytics.Jul 4 2020, 7:59 AM