Connect Atlas to hive
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Milimetric | T293643 Data Catalog Technical Evaluation | |||
Duplicate | Milimetric | T299165 Evaluate Atlas | |||
Resolved | • razzi | T296670 Run Atlas on test cluster | |||
Resolved | • razzi | T297841 Apache atlas build fails due to expired certificate (https://maven.restlet.com) | |||
Declined | None | T298710 Connect Atlas to a Data Source |
Event Timeline
Progress: using a public docker-compose configuration, https://github.com/sonnyhcl/apache-atlas-docker, I have gotten atlas 2.2 running on my local machine:
I have also learned the various dependencies of Atlas from the architecture diagram and the installation guide:
- HBase (or BerkeleyDB)
- Solr
- Zookeeper
- JanusGraph (can be configured to use Elasticsearch)
Here's what the atlas ui looks like after loading their sample dataset by running /opt/atlas/bin/quick_start.py (username and password are admin, loading takes about 10 minutes)
I attempted to run the install steps on an-test-coord1001, but the download requests timed out because it wasn't using the proxy:
razzi@an-test-coord1001:~/apache-atlas-sources-2.2.0$ mvn clean -DskipTests install [INFO] Scanning for projects... Downloading from central: https://repo1.maven.org/maven2/org/apache/apache/17/apache-17.pom Downloading from hortonworks.repo: https://repo.hortonworks.com/content/repositories/releases/org/apache/apache/17/apache-17.pom Downloading from apache.snapshots.repo: https://repository.apache.org/content/groups/snapshots/org/apache/apache/17/apache-17.pom Downloading from apache-staging: https://repository.apache.org/content/groups/staging/org/apache/apache/17/apache-17.pom Downloading from default: https://repository.apache.org/content/groups/public/org/apache/apache/17/apache-17.pom Downloading from java.net-Public: https://maven.java.net/content/groups/public/org/apache/apache/17/apache-17.pom Downloading from repository.jboss.org-public: https://repository.jboss.org/nexus/content/groups/public/org/apache/apache/17/apache-17.pom Downloading from typesafe: https://repo.typesafe.com/typesafe/releases/org/apache/apache/17/apache-17.pom [ERROR] [ERROR] Some problems were encountered while processing the POMs: [FATAL] Non-resolvable parent POM for org.apache.atlas:apache-atlas:2.2.0: Could not transfer artifact org.apache:apache:pom:17 from/to central (https://repo1.maven.org/maven2): Connect to repo1.maven.org:443 [repo1.maven.org/146.75.36.209] failed: Connection timed out (Connection timed out) and 'parent.relativePath' points at wrong local POM @ line 23, column 13 @ [ERROR] The build could not read 1 project -> [Help 1] [ERROR] [ERROR] The project org.apache.atlas:apache-atlas:2.2.0 (/home/razzi/apache-atlas-sources-2.2.0/pom.xml) has 1 error [ERROR] Non-resolvable parent POM for org.apache.atlas:apache-atlas:2.2.0: Could not transfer artifact org.apache:apache:pom:17 from/to central (https://repo1.maven.org/maven2): Connect to repo1.maven.org:443 [repo1.maven.org/146.75.36.209] failed: Connection timed out (Connection timed out) and 'parent.relativePath' points at wrong local POM @ line 23, column 13 -> [Help 2] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException [ERROR] [Help 2] http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException
(with the following proxy settings)
export http_proxy=http://webproxy:8080 export https_proxy=http://webproxy:8080
@Ottomata chimed in and gave me the mvn flag to enable proxies: -Djava.net.useSystemProxies=true
I then got a different error:
razzi@an-test-coord1001:~/apache-atlas-sources-2.2.0$ mvn clean -DskipTests install -Djava.net.useSystemProxies=true [...] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:2.9.1:check (checkstyle-check) on project apache-atlas: Execution checkstyle-check of goal org.apache.maven.plugins:maven-checkstyle-plugin:2.9.1:check failed: Plugin org.apache.maven.plugins:maven-checkstyle-plugin:2.9.1 or one of its dependencies could not be resolved: Failure to find org.apache.atlas:atlas-buildtools:jar:1.0 in https://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :apache-atlas
@Milimetric had the idea to check the maven repository at https://repo.maven.apache.org/maven2/org/apache/atlas/atlas-buildtools/, and sure enough, there is no atlas-buildtools 1.0, only 0.8.1. Changing the pom.xml source to use 0.8.1 (this is a workaround, I'll email user@atlas.apache.org to see if this is a bug) got past that error, and I got a different error:
[...lots output and 20 minutes] [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 21:55 min [INFO] Finished at: 2021-12-09T19:39:34Z [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:1.4:npm (npm install) on project atlas-dashboardv2: Failed to run task: 'npm install' failed. org.apache.commons.exec.ExecuteException: Process exited with an error: 254 (Exit value: 254) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <args> -rf :atlas-dashboardv2
The relevat output which caused the build to error is:
[INFO] --- frontend-maven-plugin:1.4:npm (npm install) @ atlas-dashboardv2 --- [INFO] Running 'npm install' in /sync/apache-atlas-sources-2.2.0/dashboardv2/target [ERROR] npm ERR! code ENOENT [ERROR] npm ERR! syscall open [ERROR] npm ERR! path /sync/apache-atlas-sources-2.2.0/dashboardv2/target/node_modules/argparse/node_modules/sprintf-js/package.json.886288497 [ERROR] npm ERR! errno -2 [ERROR] npm ERR! enoent ENOENT: no such file or directory, open '/sync/apache-atlas-sources-2.2.0/dashboardv2/target/node_modules/argparse/node_modules/sprintf-js/package.json.886288497' [ERROR] npm ERR! enoent This is related to npm not being able to find a file. [ERROR] npm ERR! enoent [ERROR] [ERROR] npm ERR! A complete log of this run can be found in: [ERROR] npm ERR! /home/vagrant/.npm/_logs/2021-12-09T19_39_34_071Z-debug.log
_ Failed to execute goal com.github.eirslett:frontend-maven-plugin:1.4:npm (npm install) on project atlas-dashboardv2: Failed to run task: 'npm install' failed. org.apache.commons.exec.ExecuteException: Process exited with an error: 254 (Exit value: 254) -> [Help 1]
Huh, yeah npm isn't available in production. You could probably work aroudn this by creating and activating a conda environment, which has npm. Probably could even just activate anaconda-wmf, which has npm: source /usr/lib/anaconda-wmf/bin/activate, and then try your mvn install process.
Alternatively, could you mvn package instead of mvn install? If so, you might be able to build locally and then copy the resulting jars over to an-test-coord1001? Not sure what this will result in though, good be tons of jars to copy.
I notice that Atlas will need to contact a zookeeper cluster when it runs.
Whilst it might be possible to include a version of zookeeper and run it on an-test-coord1001, one other option is to use the zookeeper instance that is running on an-test-druid1001.
This is what I ended up doing when I was testing Alluxio on the test cluster and needed a zookeeper instance. See T266641#7291377 for details.
I created a follow-up ticket to T289056: Create analytics-test-eqiad zookeeper cluster but it hasn't been prioritized. Just wanted to mention it in case zookeeper was becoming a blocker in any way to getting Atlas tested.
@BTullis there is a test-zookeeper1002.eqiad.wmnet node, but it's not accessible from the analytics vlan. I think we should be able to punch a hole in it using something like https://wikitech.wikimedia.org/wiki/Network_cheat_sheet#Edit_ACLs_for_Network_ports
@elukey care to weigh in on whether we're on the right track for zookeeper in the analytics test cluster?
We should punch a hole to test-zookeeper. However!
one other option is to use the zookeeper instance that is running on an-test-druid1001.
@razzi this would help you move forward now, it should be fine to use the druid test zk.
I have been doing some more work on this too, given its priority. I'm currently blocked by Kerberos when trying to import metadata from Hive.
I've joined the mailing list and sent a message to user@atlas.apache.org
https://lists.apache.org/thread/d3z4jlp6663dj45xmnk0rpg6fkjjowr6
I've got an instance of Atlas 3.0.0-SNAPSHOT currently running on port 21000 on an-test-coord1001
I tried 2.2.0 but then replaced it with the tip from https://github.com/apache/atlas in order to address a logging bug.
It hasn't affected the Kerberos bug, though so rolling back to 2.2.0 is still easy.
I decided to build it with the embedded BerkeleyDB & Apache Solr profile, so that uses the maven command:
mvn clean -DskipTests package -Pdist,berkeley-solr
This version starts a local zookeeper server as well, whereas some of the other profiles skip that.
Then I started trying to run bin/import-hive.sh in order to connect to hive and import the metadata.
The first issue that I found was strange configuration errors.
The cause was the script importing all of the .jar files from the hadoop classpath, which meant that an older, incompatible version of commons-configuration was imported and used in preference to the version that is bundled with Atlas.
I modified the bin/import-hive.sh script as follows, to work around that issue.
# Multiple jars in HADOOP_CP_EXCLUDE_LIST can be added using "\|" separator # Ex: HADOOP_CP_EXCLUDE_LIST="javax.ws.rs-api\|jersey-multipart" HADOOP_CP_EXCLUDE_LIST="commons-configuration" HADOOP_CP= for i in $(hadoop classpath | tr : "\n") do for j in $(find $i -name "*.jar" | grep -v "$HADOOP_CP_EXCLUDE_LIST") do HADOOP_CP="${HADOOP_CP}:$j" done done
I've made several more steps forward on this, with sincere thanks to @elukey. Unfortunately we have now hit a serious blocker in terms of the Hive integration with Atlas.
The blocking element is this bug: https://issues.apache.org/jira/browse/ATLAS-3905
Essentially, version 2.0 and above are incompatible with our version 2.3.6 of Hive. It requires Hive version 3.1 or above: From this comment:
Atlas 2.1.0 uses Hive 3.1.0, if your local environment is with earlier hive version ( e.g 1.1.0 ) they are incompatible due the fact that your local hive does not know about the getDatabaseName method. Simply adding different hive-metastore-X.X.jar would not help as the jar have other dependencies too and they need to be satisfied. The easiest thing would be to migrate to hive-version compatible with your environment, either upgrade hive locally or downgrading to atlas 1.X until you migrate to the next hive version.
This is evident from the most recent stack trace that we have seen.
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hive.metastore.api.Database.getCatalogName()Ljava/lang/String; at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.getDatabaseName(HiveMetaStoreBridge.java:688) at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.toDbEntity(HiveMetaStoreBridge.java:668) at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.toDbEntity(HiveMetaStoreBridge.java:660) at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.registerDatabase(HiveMetaStoreBridge.java:527) at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importDatabases(HiveMetaStoreBridge.java:391) at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importDataDirectlyToAtlas(HiveMetaStoreBridge.java:351) at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:185) Failed to import Hive Meta Data! Check logs at: /home/btullis/atlas/apache-atlas-3.0.0-SNAPSHOT/logs//import-hive.log for details.
Getting to this point required quite a bit of work to get Kerberos authentication working for the bin/import-hive.sh script, which I will summarise here:
Copies of both hive-site.xml and hadoop-site.xml were required in the $HIVE_CONF directory, along with a copy of atlas-application.properties
A jaas-application.properties file was required in the $ATLAS_CONF directory with the following content:
atlas.jaas.hive.loginModuleName = com.sun.security.auth.module.Krb5LoginModule atlas.jaas.hive.loginModuleControlFlag = required atlas.jaas.hive.option.useKeyTab = false atlas.jaas.hive.option.useTicketCache = true atlas.jaas.hive.option.storeKey = true atlas.jaas.hive.option.principal = btullis@WIKIMEIDA
A jaas_hive.conf file was also required in the $ATLAS_CONF directory with the following content:
com.sun.security.jgss.krb5.initiate { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=false useTicketCache=true doNotPrompt=true principal="btullis@WIKIMEDIA" debug=true; };
When the org.apache.atlas.hive.bridge.HiveMetaStoreBridge process was launched by the script, the following flags were passed to the JRE.
/usr/bin/java -Datlas.log.dir=/home/btullis/atlas/apache-atlas-3.0.0-SNAPSHOT/logs/ -Datlas.log.file=import-hive.log -Dlog4j.configuration=atlas-hive-import-log4j.xml -Djavax.security.auth.useSubjectCredsOnly=false -Djava.security.auth.login.config=/home/btullis/atlas/apache-atlas-3.0.0-SNAPSHOT/conf/jaas_hive.conf -Djava.security.krb5.conf=/etc/krb5.conf -Dsun.security.krb5.debug=true -Djava.security.debug=gssloginconfig,configfile,configparser,logincontext -cp <snip><snip> <lots of jars> org.apache.atlas.hive.bridge.HiveMetaStoreBridge
Unfortunately, unless we want to continue the investigation with version 1.2.0 (released in June 2019), or do without the Hive integration, I can't see any other reasonable way forward with Atlas.
OK, agreed that is another way forward. I will look into it. I had assumed that it would be a lot more work than we had bargained for, but maybe not.
In case it helps, the UI for Atlas on the test cluster can be accessed by using an SSH tunnel like so:
ssh -NL 21000:an-test-coord1001.eqiad.wmnet:21000 an-test-coord1001.eqiad.wmnet and then browsing to http://localhost:21000
Username and password are both *admin*
I haven't run the bin/quick_start.py so it has not generated the sample data. I was hoping to import it from Hive.
We're calling this done, since the latest Atlas not supporting the hive version we're running is enough of a blocker that we're pausing with Atlas for now.
We also asked a few questions (1, 2) on the mailing list, and didn't get any response, so unfortunately the Atlas community seems pretty inactive.