Page MenuHomePhabricator

Implement periodical cleaning of Airflow databases
Open, Needs TriagePublic3 Estimated Story Points

Description

The Airflow database stores lots of logs and information about dag_runs, tasks, sensors, SLA misses, etc.
This data will eventually fill Airflow's database enough that it becomes unusable.
We need to put some mechanism in place that removes unnecessary data from the DB, probably data older than a given number of months.
Here's an example of how we could do it:
https://cloud.google.com/composer/docs/cleanup-airflow-database

Event Timeline

EChetty set the point value for this task to 3.Nov 28 2022, 8:18 PM
EChetty moved this task from To be prioritised to Sprint 05-06 on the Data Pipelines board.
EChetty edited projects, added Data Pipelines (Sprint 05-06); removed Data Pipelines.