Currently located here: https://gitlab.wikimedia.org/repos/data-engineering/workflow_utils
Bundled in: https://gerrit.wikimedia.org/r/admin/repos/operations/debs/airflow
Triggered by scap: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags-scap-analytics
Improvements:
- create an independent cache store by airflow instance
- warn of unused artifacts when running workflow_utils/artifact/cli/warm By listing the cache and diffing with the yaml file
- add an evict script (to be used in airflow-dags) to clean the cache from unspecified artifacts (the ones removed from artifacts.yaml)
- Maybe move this artifact caching library into its own repo
- Cached artifacts from Gitlab package 'download' links that are 'archives' (e.g. .tgz files) don't work with SparkSubmitOperator archives param. This param expects archive files to end in an extension like .tgz in order to automate unpacking the archive on the workers. This should be fixed. See: this MR: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/47#note_6838. Done in MR25