Conclusions
OpenMetadata is a relatively new offering from Uber's data team. They seem to emphasize a concise set of features and good UX, along with connectivity with more modern data stacks.
Pros
- Incredibly responsive community, on slack and github
- Fast and simple UI
- very easy to install
- built on good standards like JSONSchema where DataHub uses Pegasus
Cons
- Hive connector is in a very early alpha stage
- MySQL 8+ required, hard for us to set up here
- too much manual metadata wrangling allowed by UI
Run
To run this, make sure the three services below are running, tunnel and visit http://localhost:8585
# everything was set up under user milimetric, but should be easily copy-able ssh an-test-client1001.eqiad.wmnet systemctl --user start mysql8 systemctl --user start opensearch systemctl --user start openmetadata ssh -N an-test-client1001.eqiad.wmnet -L 8585:127.0.0.1:8585
Steps to Reproduce Installation
- MySQL 8+ (the versions of MariaDB that we support are not compatible)
- find an archive like mysql-server_8.0.28-1debian10_amd64.deb-bundle.tar
- extract with dpkg-deb -x
- create database "openmetadata" and user with access
- create a systemd unit, could look like:
[Unit] Description=MySQL server, version 8 [Service] Type=simple Environment=LD_LIBRARY_PATH=/srv/data-catalog-tmp/mysql-server-chroot/usr/lib/x86_64-linux-gnu ExecStart=/srv/data-catalog-tmp/mysql-server-chroot/usr/sbin/mysqld --defaults-file=/srv/data-catalog-tmp/mysql-server-chroot/etc/mysql/mysql.cnf [Install] WantedBy=multi-user.target
- OpenSearch (to satisfy ElasticSearch req.)
- download and extract
- in ./config/opensearch.yml add plugins.security.disabled: true
- create a systemd unit, ours looks like:
[Unit] Description=OpenSearch server, running wih security plugin disabled (pw admin:admin) [Service] Type=simple Environment=LD_LIBRARY_PATH=/srv/data-catalog-tmp/mysql-server-chroot/usr/lib/x86_64-linux-gnu ExecStart=/home/milimetric/opensearch-1.2.4/bin/opensearch [Install] WantedBy=multi-user.target
- download and install OpenMetadata
- use Java 11: export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
- with mysql running, database and user created, config OpenMetadata to point to it
- run ./bootstrap/bootstrap_storage.sh migrate to create tables
- point to OpenSearch from config (never dug into this)
- systemd unit:
[Unit] Description=OpenMetadata service [Service] Type=forking Environment=JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64 ExecStart=/home/milimetric/openmetadata-0.8.0/bin/openmetadata.sh start ExecStop=/home/milimetric/openmetadata-0.8.0/bin/openmetadata.sh stop [Install] WantedBy=multi-user.target
Metadata Ingestion
conda create -n airflow_py_38 python=3.8 conda activate airflow_py_38 export https_proxy=http://webproxy.eqiad.wmnet:8080 pip install wheel pip install hmsclient pip install apache-airflow[hdfs,hive,kerberos] pip install flask-admin==1.4.0 pip install pyarrow pip install openmetadata-ingestion[hive,data-profiler]
Bug with the Hive connector, I filed and they fixed, hacked around in the meantime.
More problems configuring this, opened an issue and hacked around the limitations.
Finally our ingestion config looked like:
current_user = pwd.getpwuid(os.getuid()).pw_name
config = """
{
"source": {
"type": "hive",
"config": {
"database": "wmf",
"host_port": "analytics-hive.eqiad.wmnet",
"service_name": "hive_test_cluster",
"generate_sample_data": "true",
"scheme": "hive",
"query": "select * from {}.{} where year=2022 limit 50",
"data_profiler_enabled": "false",
"data_profiler_offset": "0",
"data_profiler_limit": "50000",
"connect_args": {
"auth": "KERBEROS",
"username": """ + '"' + current_user + '"' + """,
"kerberos_service_name": "hive"
}
}
},
"sink": {
"type": "metadata-rest",
"config": {}
},
"metadata_server": {
"type": "metadata-server",
"config": {
"api_endpoint": "http://localhost:8585/api",
"auth_provider_type": "no-auth"
}
}
}
"""(for the record, the initial notes along with a failed attempt to compile MySQL 8 are here: https://app.slack.com/docs/T024KLHS4/F02V248G4BV?origin_team=T024KLHS4&origin_channel=C02UCDD7FKK, the relevant parts were copied above)