Page MenuHomePhabricator

Batch dump jobs within a single airflow task to speed up DAG execution
Closed, ResolvedPublic

Description

As each task relies on the scheduling and execution of 2 pods (the task itself as well as the pod created by the KubernetesPodOperator), splitting each dump job into its dedicated airflow task incurs a steep price paid in waiting for pods to be scheduled.

It might be easier to batch jobs by, say, 5, to both keep the DAG small as well as speed up overall execution.

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
test_k8s/dumpsv1: batch multiple dump jobs in the same airflow taskrepos/data-engineering/airflow-dags!1273brouberolT392461main
Customize query in GitLab

Event Timeline

brouberol triaged this task as Medium priority.

Screenshot 2025-04-29 at 14.08.48.png (602×1 px, 95 KB)
We've been able to speedup the bulk of the dump execution by 3x compared to https://phabricator.wikimedia.org/T391669#10764538

brouberol claimed this task.