Page MenuHomePhabricator

[Analytics] Switch naming conventions for currently running Airflow processes
Closed, ResolvedPublic

Description

Wikidata Analytics Request

This task was generated using the Wikidata Analytics request form. Please use the task template linked on our project page to create issues for the team. Thank you!

Purpose

Please provide as much context as possible as well as what the produced insights or services will be used for.

The naming convention that was used for the first deployed Airflow DAG jobs wasn't be best as it breaks the name of the DAG up sometimes. Specifically wd_rest_api_metrics_monthly as a DAG having wd_rest_api_metrics_gen_csv_monthly as a job. This makes making new DAGs a bit more tedious as find and replace doesn't work.

Specific Results

Please detail the specific results that the task should deliver.

The currently running DAGs should have all job identifiers moved to the start:

  • wd_rest_api_metrics
  • wd_item_sitelink_segments

Each of them have TASK_ID_gen_csv_DAG_INTERVAL and create_DAG_ID_table that should be renamed gen_csv_DAG_ID and create_table_DAG_ID.

Desired Outputs

Please list the desired outputs of this task.

Deadline

Please make the time sensitivity of this task clear with a date that it should be completed by. If there is no specific date, then the task will be triaged based on its priority.

DD.MM.YYYY


Information below this point is filled out by the task assignee.

Assignee Planning

Sub Tasks

A full breakdown of the steps to complete this task.

  • Find an appropriate time to make change so that a DAG isn't interrupted
    • After the 7th of October has been ran
  • Change HQL file names and references in the DAGs
  • Redeploy DAGs
  • Go into the Airflow deployment and delete variables/the properties in question to set the new file names
  • Confirm that export step finishes appropriately

Estimation

Estimate: 1 day
Actual: 1.5 days given that we needed to shift to a new yet to be deployed sensor for DAGs

Data

The tables that will be referenced in this task.

  • link_to_table

Notes

Things that came up during the completion of this task, questions to be answered and follow up tasks.

  • Note

Event Timeline

task is complete! closing