Page MenuHomePhabricator

[Analytics] [Request] Migrate the Wikidata Reliability metrics to Airflow
Closed, ResolvedPublic

Description

Wikidata Analytics Request

This task was generated using the Wikidata Analytics request form. Please use the task template linked on our project page to create tasks for the team. Thank you!

Purpose

Please provide as much context as possible as well as what the produced insights or services will be used for.

There are various Wikidata metrics that need to be migrated away from a Graphite based process as part of the ongoing migration from that service. For WMDE's part of this, see T371616: [EPIC][GRAFMIGR] Spruce up Wikidata Grafana Metrics, and this task is also related to T377352 and T372855. This task covers one such Graphite based process that needs to be migrated to Airflow.

Specific Results

Please detail the specific results that the task should deliver.

Within this task we will specifically be migrating the wikidata_reliability_metrics.hql metrics to Airflow. We'll need a new Iceberg table for these metrics and the query job to create these metrics will need to be ran via an Airflow DAG on the wmde Airflow instance. Note that further work will follow to backfill the historical data from Graphite.

Desired Outputs

Please list the desired outputs of this task.

  • The Wikidata Reliability metrics converted over to an Airflow DAG
  • The Grafana board for this process being set as archived

Deadline

Please make the time sensitivity of this request clear with a date that it should be completed by. If there is no specific date, then the task will be triaged based on its priority.

28.3.2025


Information below this point is filled out by the task assignee.

Assignee Planning

Sub Tasks

A full breakdown of the steps to complete this task.

  • Write Iceberg table create table script
    • wmde.wd_reliability_metrics_daily
  • Create a Iceberg table in Hive/HDFS within the wmde namespace
  • Test current Hive job query to view results
    • Old query results for 2025.03.01
  • Convert the current Hive job query to WMDE SWE Analytics standards
  • Generate testing table generation and query scripts
  • Test results to make sure that metrics will be consistent
    • New query results for 2025.03.01 should be the same
  • Write DAG to run job query
  • Write DAG tests
  • Run tests on process as possible within time limitations
  • Deploy DAG - #1185
  • Mark Grafana board as deprecated

Estimation

Estimate: 1 day
Actual:

Data

The tables that will be referenced in this task.

  • wmf.webrequest

Notes

Things that came up during the completion of this task, questions to be answered and follow up tasks.

  • Note

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
T389209 T389208 T389207 T389206 T389205 Graphite process DAGsrepos/data-engineering/airflow-dags!1185andrewtavis-wmdeT389209-T389208-T389207-T389206-T389205-graphite-dagsmain
Customize query in GitLab

Event Timeline

AndrewTavis_WMDE changed the task status from Open to In Progress.Mar 24 2025, 3:20 PM
AndrewTavis_WMDE triaged this task as High priority.
AndrewTavis_WMDE updated the task description. (Show Details)