Based on the results of T334558: [Analytics] Unique user-agents accessing Wikidata's REST API for Q2/2023, but implement as a running 30 day average.
Problem:
The Wikidata/Wikidata Analytics team would like to know unique users who are accessing the new REST API. One issue is that the information from the webrequests table is only retained for 90 days. Setting up a process within the WMF Airflow DAGs would allow us to save aggregated copies of the data for future reporting. T334558 focused on user_agent metadata associated with API requests, based on the results of this task Wikidata Analytics decided to switch to IP tracking for our internal metrics as there were many cases where user_agent data was being manipulated to allow for easier access (see T329044: Require clients to follow our User-Agent policy).
How the data will be used:
- This data will be used for WMDE quarterly reporting related to the REST API.
- Help identify more refined and meaningful metrics in the future that PMs will continuously monitor to understand Wikidata.
Assignee Planning
Information below this point is filled out by WMDE Analytics and specifically the assignee of this task.
Sub Tasks
Full breakdown of the steps to complete this task:
- Getting the general structure of the wmde directory setup on GitLab
- Testing the aggregation queries within the analytics flow
- Deployment of REST API request aggregation queries
Data to be used
See Analytics/Data_Lake for the breakdown of the data lake databases and tables.
The following tables will be referenced in this task:
- wmf.webrequest as a basis for an eventual aggregation table
Notes and Questions
Things that came up during the completion of this task, questions to be answered and follow up tasks:
- Note