To compute stateful metrics, that is metrics that depend on historical info, we need to persist a deequ repository with analysis results to HDFS.
In T349763: [Data Quality] Develop Airflow post processing instrumentation to collect and log configurable data metrics we implemented a SerDe to map deequ repositories to Wikimedia Data Quality model, persisted in iceberg. No repo info is lost during SerDe, we simply re-format content to make it meet our data model.
We should implement the reverse "iceberg to deequ" transformation that instantiates a repository from the Data Quality model.
This would remove the need to store deequ repositories json blobs to HDFS.