== Wikidata Analytics Request ==
> This task was generated using the [Wikidata Analytics](https://phabricator.wikimedia.org/project/profile/5408) request form. Please use the task template linked on our project page to create issues for the team. Thank you!
=== Purpose ===
> Please provide as much context as possible as well as what the produced insights or services will be used for.
{T370416}
=== Specific Results ===
> Please detail the specific results that the task should deliver.
We would like to continuously monitor (e.g. daily, weekly) the following metric for WDQS:
- Number of SPARQL queries that only retrieve data of a known single entity (based on T370848)
- Number of all other SPARQL queries
=== Desired Outputs ===
> Please list the desired outputs of this task.
[] Airflow pipeline to monitor the above metric
[] Output as CSV to https://analytics.wikimedia.org/published/datasets/wmde/analytics/
=== Notes ===
- We do not need 100% exact numbers, so it is okay to go for a random sample (e.g. every 100th query).
=== Open Questions ===
- What is the frequency we need (e.g. daily, weekly)?
=== Deadline ===
> Please make the time sensitivity of this task clear with a date that it should be completed by. If there is no specific date, then the task will be triaged based on its priority.
DD.MM.YYYY
---
**Information below this point is filled out by the task assignee.**
== Assignee Planning ==
=== Sub Tasks ===
> A full breakdown of the steps to complete this task.
[x] Check what the frequency of the job should be with stakeholders
- Daily or weekly?
- Answer: Weekly is fine
- Note: Because of data constraints, it might make sense to do this daily as there's so much to parse in even one day
[] Setup job queries based on results from T370848
[] Test job queries on Pyspark
[] Setup Airflow DAG to run jobs
- Query
- Export CSV
- Move CSV to published data directories
[] Before any further steps: Get approval from WMF for public data export via new Phab task
- Task id:
[] Test Airflow DAG on personal Airflow instance
[] Deploy new Airflow DAG
=== Estimation ===
Estimate: 2-3 days
Actual:
=== Data ===
> The tables that will be referenced in this task.
- `discovery.processed_external_sparql_query` as is used in previous tasks
=== Notes ===
> Things that came up during the completion of this task, questions to be answered and follow up tasks.
- Note