Page MenuHomePhabricator

Connect Atlas to a Data Source
Closed, DeclinedPublic

Description

This task is to connect the cloud test Atlas instance to a test data source - either Hive and/or a MariaDB instance. This would involve installing a test data source and configuring Altas to read the metadata. Ideally having a copy of the mediawikidatabase would give us ability to view pertinent data information. It might be possible to connect to an actual production data source in which case that might be an easier option for loading test metadata.

This would allow us to evaluate Atlas more comprehensively both from a UX perspective and technology integrations.

Event Timeline

Atlas doesn't seem to have a first-class connector for MySQL / MariaDB, people have created scripts that use the REST API to manage this kind of import. The conclusion here is that Atlas mostly cares about the Hive metastore.

Other metadata we're interested in, like AirFlow lineage, is pushed to Atlas via connectors on AirFlow's side.

As discussed in T296670#7633000 it looks like we have a serious blocker in connecting our test instance of Atlas to an existing Hive metastore.
We would either need to upgrade Hive to 3.1+ (which necessitates an upgrade to Hadoop 3+) or we would have to deploy an older version (1.2.0) of Atlas.

EChetty moved this task from Next Up to Done on the Data-Catalog board.
EChetty moved this task from Done to Blocked on the Data-Catalog board.
BTullis moved this task from Next Up to Done on the Data-Engineering-Kanban board.

We can't easily do this with our evaluation setup, due to Hive incompatibility.