Conclusions
Apache Atlas seems like the kind of product you struggle to integrate only to regret choosing it. It's complicated and not maintained in a way that's easy to deploy. The reason we looked into it was the supposed great integration with the Hadoop ecosystem, and especially the Hive metastore. But here they drop support for the Hive 2.x or lower branches, which is a major blocker for us. Beyond that, it doesn't seem like good strategy since so many people still run Hive 2.x
Pros
- Oldest project in this space, integrations to it from many other candidates
- integration with Apache Ranger for fine-grained access control
Cons
- Unresponsive community, sent several messages to the lists and nobody answered
- out of date and incorrect documentation
- no backwards compatibility for Hive ingestion, would mean we have to migrate to Hadoop 3+, which is on our roadmap but would block this project for too long
Run
- Tunnel with ssh -NL 21000:data-catalog-evaluation.analytics.eqiad1.wikimedia.cloud:21000 data-catalog-evaluation.analytics.eqiad1.wikimedia.cloud
- http://localhost:21000