Public dashboard process
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	AndrewTavis_WMDE
	Mar 28 2024, 12:02 PM

Description

Problem statement

As a Wikimedia Deutschland Data Analytics team member, I would like to be able to leverage a standardized process to conveniently create publicly available dashboards from data that currently resides in HDFS so that insights can be presented to non-WMDE/WMF staff.

Context

WMDE would like to make our Wikidata REST API metrics available for the public, but the process to do this isn't something that has been standardized. These metrics are generated by an Airflow DAG that leverages jobs defined on GitLab.

Ideas brought up in the original Slack discussion were:

Leveraging Wikistats (high effort)
- This would require creating a service and API via AQS 2
Pushing data to Prometheus such that it can be used in Grafana (strongly discouraged)
- Prometheus only supports counters and timings
- For counters, it assumes additivity – that is, a weekly count is day 1 count + ... + day 7 count
- It's impossible to control the time of your data point (it's the current time at which you push the metric)
Adding the published datasets directories as a target of the DAG jobs where TSVs would be saved and then ingested via an open Turnilo instance (best solution to date)
- An example of this is wiki-search-referrals.wmcloud.org
- Documentation for the above dashboard

General ideas

It would be great if the public dashboards were an instance of WMF long-term supported data visualization software
Ideally the public dashboards could be directly integrated into current data pipeline/Airflow based workflows
Including data stakeholder/admin oversight of what is added to this system would be ideal to protect against the inclusion of PII, regions on the Country and Territory Protection List, etc
- Maybe a specific admin only database within HDFS could be the source where the public dashboards have access?
  - Admins would be the only ones who could create tables within this database
  - This would prevent the public dashboards from presenting information that has not been actively checked for vulnerabilities
- Oversight of the Protection List and updating the public dashboards would be necessary
  - Maybe jobs that generate the data could be source controlled within a single repo with strict merge rights?
  - This would ensure that is_protected = True of canonical_data.countries would be always filtered out

Related Objects

Mentioned In: T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs
T360298: [Analytics] Public dashboard pilot

Event Timeline

AndrewTavis_WMDE created this task.Mar 28 2024, 12:02 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 28 2024, 12:02 PM

AndrewTavis_WMDE updated the task description. (Show Details)Mar 28 2024, 1:17 PM

AndrewTavis_WMDE updated the task description. (Show Details)Mar 28 2024, 1:20 PM

AndrewTavis_WMDE updated the task description. (Show Details)Mar 28 2024, 1:28 PM

AndrewTavis_WMDE updated the task description. (Show Details)

AndrewTavis_WMDE updated the task description. (Show Details)Mar 28 2024, 1:37 PM

VirginiaPoundstone added projects: Data-Engineering, Data-Engineering-Dashiki.Mar 28 2024, 1:41 PM

Thank you @AndrewTavis_WMDE for submitting this feature request. This will help us think through your use case when we begin strategy work for public data visualization enhancements.

AndrewTavis_WMDE mentioned this in T360298: [Analytics] Public dashboard pilot.Mar 28 2024, 1:51 PM

Ottomata added a subscriber: Milimetric.Mar 28 2024, 1:56 PM

Ottomata subscribed.

AndrewTavis_WMDE mentioned this in T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs.Mar 28 2024, 2:28 PM

VirginiaPoundstone moved this task from Incoming to Epics Timeline on the Data Products board.Fri, Mar 29, 4:14 PM

VirginiaPoundstone edited projects, added Data Products (Epics Timeline); removed Data Products.

VirginiaPoundstone added a project: Epic.

lbowmaker moved this task from Incoming (new tickets) to Radar (External Teams) on the Data-Engineering board.Wed, Apr 3, 12:16 PM

awight subscribed.Mon, Apr 8, 8:19 AM

Manuel subscribed.Tue, Apr 9, 8:36 AM