Page MenuHomePhabricator

Airflow operator to manage old data deletion
Open, Needs TriagePublic9 Estimated Story Points

Description

Our current mechanism to automatically delete old data is a mix of python scripts, primarily refinery-drop-older-than, with systemd timers to run the script on a schedule. This works well for data that we own, as it requires changes to our puppet configuration, and most data is owned by the analytics user.

Now that we are onboarding other teams, we need a mechanism that they can more easily leverage, and hopefully it can become just one more Airflow operator.

In this task, we thus want to:

  1. Investigate what is needed to wrap refinery-drop-older-than into an Airflow operator to be run as whatever user is running the DAG.
  2. Implement the mechanism.
  3. Document how to use this new operator.
  4. Apply this mechanism to at least one production system, perhaps the platform_eng Airflow instance.