Page MenuHomePhabricator

[Airflow] Migrate unique devices Druid loading jobs
Closed, ResolvedPublic9 Estimated Story Points

Description

We have 6 Oozie jobs that load unique devices data to Druid.

  1. unique devices per domain daily
  2. unique devices per domain daily aggregated monthly
  3. unique devices per domain monthly
  4. unique devices per project family daily
  5. unique devices per project family daily aggregated monthly
  6. unique devices per project family monthly

We can group them into 2 DAG files:

  1. unique devices per domain DAG file, with:
    • daily DAG
    • daily aggregated monthly DAG
    • monthly DAG
  2. unique devices per project family DAG file, with:
    • daily DAG
    • daily aggregated monthly DAG
    • monthly DAG

Event Timeline

This is currently in review for DAG file 1) and in progress for DAG file 2:
https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/350

mforns set the point value for this task to 9.Apr 5 2023, 3:08 PM

Change 910092 had a related patch set uploaded (by Mforns; author: Mforns):

[analytics/refinery@master] Migrate unique devices druid loading queries to Airflow/SparkSQL

https://gerrit.wikimedia.org/r/910092

Change 910094 had a related patch set uploaded (by Mforns; author: Mforns):

[analytics/refinery/source@master] Fix HiveToDruid to allow for non-partitioned source tables.

https://gerrit.wikimedia.org/r/910094

I've tested the DAGs in the dev instance and all seems to work! This is in code review :-)

Change 910094 merged by jenkins-bot:

[analytics/refinery/source@master] Fix HiveToDruid to allow for non-partitioned source tables.

https://gerrit.wikimedia.org/r/910094

Change 910092 merged by Mforns:

[analytics/refinery@master] Migrate unique devices druid loading queries to Airflow/SparkSQL

https://gerrit.wikimedia.org/r/910092