Wikidata Analytics Request
This task was generated using the Wikidata Analytics request form. Please use the task template linked on our project page to create tasks for the team. Thank you!
Purpose
Please provide as much context as possible as well as what the produced insights or services will be used for.
In T390888 we derived an initial baseline for how much users are turning on Wikidata changes in their Wikipedia preferences. We would like to collect these metrics on a monthly basis.
Specific Results
Please detail the specific results that the task should deliver.
It would be good to track these metrics to better understand how changes that are being done by the WMDE WIT team are effecting Wikidata change monitoring.
Desired Outputs
Please list the desired outputs of this task.
- A monthly DAG that generates the metrics in a table in the data lake
Deadline
Please make the time sensitivity of this request clear with a date that it should be completed by. If there is no specific date, then the task will be triaged based on its priority.
DD.MM.YYYY
Information below this point is filled out by the task assignee.
Assignee Planning
Sub Tasks
A full breakdown of the steps to complete this task.
- Use MariaDB process developed in T360296
- Write Iceberg table create table script
- Create a Iceberg table in Hive/HDFS within the wmde namespace
- Convert monthly stats query to production Airflow query
- Generate testing table generation and query scripts
- Write DAG to run job query
- Write DAG tests
- Run tests on process as possible within time limitations
- Deploy DAG
Estimation
Estimate:
Actual:
Data
The tables that will be referenced in this task.
- user_properties within MariaDB with the data being ported over to a table in the data lake
Notes
Things that came up during the completion of this task, questions to be answered and follow up tasks.
- Note