Refine currently uses a DataFrameToHive class that automates schema evolution and insertion into Hive tables from a Spark DataFrame. We want to implement similar functionality for Iceberg, so likely a DataFrameToIceberg class. In the future, we could unify this kind of connector interface, but for now, a standalone Icerberg implemenation will be fine.
We will then use this to make a new (or adapted) RefineSanitize job that will read from event Hive tables and write to new iceberg tables in a new event_sanitized_iceberg database. This database will eventually replace event_sanitized.