Page MenuHomePhabricator

Implement periodical cleaning of Airflow databases
Open, HighPublic3 Estimated Story Points

Description

The Airflow database stores lots of logs and information about dag_runs, tasks, sensors, SLA misses, etc.
This data will eventually fill Airflow's database enough that it becomes unusable.
We need to put some mechanism in place that removes unnecessary data from the DB, probably data older than a given number of months.
Here's an example of how we could do it:
https://cloud.google.com/composer/docs/cleanup-airflow-database

Event Timeline

EChetty set the point value for this task to 3.Nov 28 2022, 8:18 PM
EChetty moved this task from To be prioritised to Sprint 05-06 on the Data Pipelines board.
Gehel triaged this task as High priority.Feb 15 2024, 3:24 PM
Gehel moved this task from Incoming to Toil / Automation on the Data-Platform-SRE board.