Change Details

As an Engineer, I would like to execute the `article-bulk` DAG, so that I can trigger the `Re-ingestion` process and update the WME dataset with fresh data and new `v2` schema. **Acceptance Criteria** - New projects projects are added to the system. - Ingestion process executed and the WME dataset is updated with the newest data. **To-Do** Refer to detailed checklist at `Runbooks/Bulk Ingestion Runbook v2` section `Bulk-ingestion using versioning checklist`. [x] Send an email to sre-service-ops [x] Deploy pre-ingestion changes to prod for the following services: - Infrastructure/services - Structured-data - On-demand (need to deploy a new version of the service) - Scheduler [x] Run ingestion in production [] Monitor ingestion (3-4 days) for the following: - Bulk-ingested articles should be landing on vn+1 compacted topic. - Event-based articles should be landing on both vn & vn+1 compacted topics. - On-demand should be updating articles on both key locations - On-demand, batches and snapshots are still picking from vn compacted topic **Acceptance criteria** [] vn+1 compacted topics should have similar amount of messages as those of vn compacted topics [] In s3, articles should be present at both key locations: `articles/project_identifier/article_name.json and articles_v2/project_identifier/article_name.json` **Additional Context** Currently, we have an Airflow DAG specifically built to populate our system with "baseline" data. This data acts as the initial dataset, which we then routinely refresh and maintain. Our aim is to execute this DAG specifically to ensure that our Kafka cluster contains a fresh, up-to-date version of this baseline dataset.