Change Details

We want to be able to run the `bulk ingestion` process on a monthly basis automatically, so that we can be sure that any data that was lost to the incidents or schema changes that we do in our system is automatically reflected in all of the APIs. **Acceptance criteria** Bulk ingestion DAG runs every month automatically. or create implementation tickets. **ToDo** - [ ] setup a schedule for `bulk-ingestion` DAG inside `scheduler` (Airflow) - [ ] make sure that we don't generate bulky snapshots due to data beingthat are double in size due to bulk ingestion produced by ingestioncess being in progress (need to solution this) - [ ] figure out what to do with `batches` while ingestion is running (they will be floated with data from ingestion) **Notes** For more context about the ingestion process and how it's executed please refer to the `Bulk Ingestion Runbook v2` under `Runbooks` directory on `Product/Eng.` drive. Also try to use the Runbook to go through the process on `dev` environment to get a solid grasp of the issue before starting the actual work. **Things to consider** * We'll have to figure out what to do with the batches in the scenario of monthly ingestion, due to the way it's implemented right now it will not work properly in the days that ingestion is running. * Snapshots will need a elaborated solution on how to track when ingestion has started for a particular project, cuz if we do nothing we'll have snapshots double in size for next couple of days after the ingestion.