Page MenuHomePhabricator

Split wiki dump DAG into multiple tasks
Closed, ResolvedPublic

Description

The worker dump script iterates over all kinds of dumps if not provided the --job <name-csv> value.
Instead of letting it handle the whole bulk of the work, we will generate an airflow task for each job, so that we get a finer grain view of where we are.

This will also help us uploading the result of each dump to the clouddumps servers as soon as it is generated.

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
test_k8s/dumpsv1: Include the dump job name in the pod namerepos/data-engineering/airflow-dags!1211brouberolT390852main
Split dump task into a chain of tasks, one per dump job typerepos/data-engineering/airflow-dags!1210brouberolT390852main
Customize query in GitLab

Event Timeline

brouberol triaged this task as Medium priority.