Wikidata Analytics Request
This task was generated using the Wikidata Analytics request form. Please use the task template linked on our project page to create tasks for the team. Thank you!
Purpose
Please provide as much context as possible as well as what the produced insights or services will be used for.
There are various Wikidata metrics that need to be migrated away from a Graphite based process as part of the ongoing migration from that service. For WMDE's part of this, see T371616: [EPIC][GRAFMIGR] Spruce up Wikidata Grafana Metrics, and this task is also related to T377352 and T372855. This task covers one such Graphite based process that needs to be migrated to Airflow.
Specific Results
Please detail the specific results that the task should deliver.
Within this task we will specifically be migrating the wikidata_reliability_metrics.hql metrics to Airflow. We'll need a new Iceberg table for these metrics and the query job to create these metrics will need to be ran via an Airflow DAG on the wmde Airflow instance. Note that further work will follow to backfill the historical data from Graphite.
Desired Outputs
Please list the desired outputs of this task.
- The Wikidata Reliability metrics converted over to an Airflow DAG
- The Grafana board for this process being set as archived
Deadline
Please make the time sensitivity of this request clear with a date that it should be completed by. If there is no specific date, then the task will be triaged based on its priority.
28.3.2025
Information below this point is filled out by the task assignee.
Assignee Planning
Sub Tasks
A full breakdown of the steps to complete this task.
- Write Iceberg table create table script
- wmde.wd_reliability_metrics_daily
- Create a Iceberg table in Hive/HDFS within the wmde namespace
- Test current Hive job query to view results
- Old query results for 2025.03.01
- Convert the current Hive job query to WMDE SWE Analytics standards
- Generate testing table generation and query scripts
- Test results to make sure that metrics will be consistent
- New query results for 2025.03.01 should be the same
- Write DAG to run job query
- Write DAG tests
- Run tests on process as possible within time limitations
- Deploy DAG - #1185
- Mark Grafana board as deprecated
Estimation
Estimate: 1 day
Actual:
Data
The tables that will be referenced in this task.
- wmf.webrequest
Notes
Things that came up during the completion of this task, questions to be answered and follow up tasks.
- Note