Wikidata Analytics Request
This task was generated using the Wikidata Analytics request form. Please use the task template linked on our project page to create tasks for the team. Thank you!
Purpose
Please provide as much context as possible as well as what the produced insights or services will be used for.
As part of the Wikidata Integrations Team's annual goals, we would like to track the awareness of Wikidata Integration by monitoring the pages of particular help pages hosted on Wikidata. These types of help pages provide useful information for editors who want to use WIkidata's data in their projects. An example of such a page includes the recently revamped Help Sitelinks page. We want to check if improving the information on the pages attracts more users.
Specific Results
Please detail the specific results that the task should deliver.
A pipeline for this directly such that the data is saved on a regular basis e.g. on a monthly or quarterly basis. The pages we are concerned about are:
- Help:Items
- Help:Sitelinks
- Help:Edit Summary
- Wikidata in Wikimedia Projects
- How to use Wikidata in Wikimedia Projects
Desired Outputs
Please list the desired outputs of this task.
- A notebook that explores the process to get these numebrs
- A query that can be run to generate the number on Superset or any other recommended platform
- View query at https://phabricator.wikimedia.org/T390891#10726926
- The values for February and March reported
February 2025
- total_help_items_views: 47,017
- total_help_sitelinks_views: 6,255
- total_help_edit_summary_view: 76
- total_wikidata_wiwp_views: 469
- total_wikidata_how_to_use_wd_views: 3,679
- Total from summing the above: 57,496
March 2025
- total_help_items_views: 12,557
- total_help_sitelinks_views: 4,076
- total_help_edit_summary_view: 70
- total_wikidata_wiwp_views: 530
- total_wikidata_how_to_use_wd_views: 1,974
- Total from summing the above: 19,207
- A DAG to collect these metrics on a monthly basis
- Populates the table wmde.wit_docs_pageview_metrics_monthly
- The data from the original task being added to the new table
Deadline
Please make the time sensitivity of this request clear with a date that it should be completed by. If there is no specific date, then the task will be triaged based on its priority.
09.04.2025
Information below this point is filled out by the task assignee.
Assignee Planning
Sub Tasks
A full breakdown of the steps to complete this task.
February and March Numbers
- Explore potentially useful tables (DataHub)
- wmf.webrequest
- wmf_deprecated.webrequest
- Derive method of getting views for specific pages
- Get totals per page and aggregate for February and March
Monthly DAG
- Write Iceberg table create table script
- Create a Iceberg table in Hive/HDFS within the wmde namespace
- Convert monthly stats query to production Airflow query
- Generate testing table generation and query scripts
- Write DAG to run job query
- Write DAG tests
- Run tests on process as possible
- Deploy DAG
Adding Old Data
- Add data from notebook and wmf_deprecated.webrequest to the new Iceberg table
Estimation
Estimate - Feb/Mar numbers: 1/2 a day
Actual - Feb/Mar numbers: 1/2 a day
Estimate - DAG: 1/2 a day
Actual - DAG: 1/2 a day
Data
The tables that will be referenced in this task.
- wmf.webrequest
- wmf_deprecated.webrequest
Notes
Things that came up during the completion of this task, questions to be answered and follow-up tasks.