Page MenuHomePhabricator

Write a DAG checking that the statically defined list of DAGs matches what NOC returns
Closed, ResolvedPublic

Description

We have decided to statically define the dumps v1 DAG, meaning that the scheduler iterates over a hardcoded list of wikis and creates the whole DAG at processing time, instead of relying on dynamic mapped tasks at runtime. We do this as the resulting DAG consumes about half as much resource to schedule.

However, we risk inducing a drift between the hardcoded list of wikis and what NOC returns. We should run a weekly monitoring DAG that simply fetches the regular and large wiki lists from noc.wikimedia.org, compares them with what we have hardcoded, and fails if the lists differ.

This DAG should alert the DPE SRE team when failing.

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
test_k8s: new DAG comparing noc vs hardcoded list of wikisrepos/data-engineering/airflow-dags!1251brouberolT391745main
Customize query in GitLab