In T352688: [Data Quality] Move MetricsExporter to refinery-spark we introduced new Scala classes in refinery-spark, that integrate with Amazon Deequ to persist metrics in a generic way.
We should make both Amazon Deequ and refinery-spark classes available and interoperable with pyspark code.
For deequ this means:
- provide pydeequ dep in the conda-analyitcs env.
For refinery-spark we should:
- document (or provide a lightweight wrapper) on how to interface from Python (the library is GA on HDFS).
Changelog
- An initial Python wrapper to refinery-spark classes is available at https://gitlab.wikimedia.org/gmodena/refinery-python