Page MenuHomePhabricator

Migrate airflow dags from the Search Platform instance to Wikidata Platform
Open, Needs TriagePublic

Description

As part of WDQS onboarding we need to migrate airflow dags off the search platform instance.

The following DAGs need to refactored, moved under the wikidata airflow-dags path, and scheduled on the Wikidata Platform instance:

As part of this migration we'll need to:

Needs decision:

  • Whether we will migrate WCQS related analytics to the Wikidata Platform instance.

AC

  • DAGs have been relocated under wikidata/dags in the monorepo.
  • DAGs are deployed and scheduled in the wikidata instance.
  • Alerts are routed to wikidata-platform channels.
  • Create a new wikidata (name TBC) database in metastore/iceberg, to host WDQS-related datasets. Notify discovery users of deprecation, and provide a migration path. This is expected to be transparent to airflow consumers.
  • Items that need decision have been logged in Wikidata Platform's decision log

References:

Event Timeline

@gmodena the list of dags seem correct to me, there'll be some parts of drop_old_data_daily.py that might be moved over as well (cleanups of rdf data from import_ttl and query analytics).

@gmodena the list of dags seem correct to me, there'll be some parts of drop_old_data_daily.py that might be moved over as well (cleanups of rdf data from import_ttl and query analytics).

Good catch! IIRC the bulk of data rotation logic was upstreamed to airflow common libs, so hopefully it won't lead to excessive duplication. I'll update the task description to reflect this.