As a software engineer, I would like the capability to programmatically manage schemas of Refined tables.
As part of the refine refactory, we should extract schema management into a dedicated tool.
The tool should be able to.
- read from config dbs and schema that should exist in metastore (jsonschema vs calcite?)
- db -> schema URI
- The tool should do multiple things
- validate table existence
- validate coherence
- migrate the schema
- update table properties
- be executable at the CLI and via airflow
Success criteria
- Code
- Unit Tests
- Migrate existing code to be compatible with Iceberg tables
- Manual dryrun on currently refined tables
- Release and used from Airflow refine dag