Two things need/will benefit from this:
- the move to airflow - If we had a simpler way for Airflow to get status from events jobs that'd be cool (currently looking for HDFS files and checking dates).
- the move to iceberg - Having files in hourly partition defeats iceberg feature of having hidden partition that we grow. We currently add files in folders not used by iceberg, but it'd be great to have a better solution.
Ideas discussed with Andrew:
- One file per dataset on HDFS
- One global iceberg table