As part of WDQS onboarding we need to migrate airflow dags off the search platform instance.
The following DAGs need to refactored, moved under the wikidata airflow-dags path, and scheduled on the Wikidata Platform instance:
- subgraph_and_query_mapping
- subgraph_and_query_metrics
- process_sparql_query
- rdf_streaming_updater_reconcile
- import_ttl
As part of this migration we'll need to:
- Route alerts to the wikidata-platform teams slack channel and mailing list
- Implement a data retention / pruning dag (see https://phabricator.wikimedia.org/T414426#11516152 ).
Needs decision:
- Whether we will migrate WCQS related analytics to the Wikidata Platform instance.
AC
- DAGs have been relocated under wikidata/dags in the monorepo.
- DAGs are deployed and scheduled in the wikidata instance.
- Alerts are routed to wikidata-platform channels.
- Create a new wikidata (name TBC) database in metastore/iceberg, to host WDQS-related datasets. Notify discovery users of deprecation, and provide a migration path. This is expected to be transparent to airflow consumers.
- Items that need decision have been logged in Wikidata Platform's decision log
References: