[Epic] Commons Impact Metrics Implementation
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	mforns
	Feb 28 2024, 2:44 PM

Description

This epic includes all the work related to develop and productionize the data pipeline for Commons Impact Metrics:
Queries and Spark code, Airflow jobs, Dumps, Public API, Allow-list management, Documentation and Applying insights from the Community feedback to the data model.

Step 0:
T358688: [Commons Impact Metrics] Understand feedback from Community and decide what changes to apply
T358695: [Commons Impact Metrics] Establish how we represent the allow-list

Step 1:
T358681: [Commons Impact Metrics] Productionize SparkSQL and Spark-Scala -> T358699: [Commons Impact Metrics] Create Airflow job that generates the datasets in Iceberg
T358679: [Commons Impact Metrics] Design API endpoints and Cassandra/Druid datasources

Step 2:
T358718: [Commons Impact Metrics] Create a new AQS service with all the endpoints
T358707: [Commons Impact Metrics] Create Airflow job that formats and loads the data to Cassandra for AQS

Step 3:
Continue T358718: [Commons Impact Metrics] Create a new AQS service with all the endpoints -> T358715: [Commons Impact Metrics] Add test data in AQS's test environments to back up new AQS service
T358719: [Commons Impact Metrics] Backfill datasets in Iceberg and Cassandra/Druid
T358722: [Commons Impact Metrics] Create API documentation

Step 4:
T358701: [Commons Impact Metrics] Create Airflow job that generates the public dumps -> T358710: [Commons Impact Metrics] Make dumps accessible from analytics.wikimedia.org
T358720: [Commons Impact Metrics] Create documentation of the main pipeline
T358712: [Commons Impact Metrics] Implement necessary tools and process to maintain the allow-list

Related Objects
Search...

Status	Assigned	Task
Open	None	T358673 [Epic] Commons Impact Metrics Implementation
Resolved	mforns	T358679 [Commons Impact Metrics] Design API endpoints and Cassandra/Druid datasources
Open	xcollazo	T358681 [Commons Impact Metrics] Productionize SparkSQL and Spark-Scala
Resolved	VirginiaPoundstone	T358688 [Commons Impact Metrics] Understand feedback from Community and decide what changes to apply
Resolved	xcollazo	T358695 [Commons Impact Metrics] Establish how we represent the allow-list
Open	mforns	T358699 [Commons Impact Metrics] Create Airflow job that generates the datasets in Iceberg
Open	SGupta-WMF	T358701 [Commons Impact Metrics] Create Airflow job that generates the public dumps
Open	xcollazo	T358707 [Commons Impact Metrics] Create Airflow job that formats and loads the data to Cassandra for AQS
Open	None	T362697 Create Cassandra tables for Commons Impact Metrics
Open	None	T358710 [Commons Impact Metrics] Make dumps accessible from analytics.wikimedia.org
Open	None	T358712 [Commons Impact Metrics] Implement necessary tools and process to maintain the allow-list
Open	None	T358715 [Commons Impact Metrics] Add test data in AQS's test environments to back up new AQS service
Open	Milimetric	T358718 [Commons Impact Metrics] Create a new AQS service with all the endpoints
In Progress	SGupta-WMF	T361668 Go project and solution setup
In Progress	SGupta-WMF	T361669 Implement Category Metrics Snapshot API
Open	SGupta-WMF	T362141 Implement Media file metrics snapshot
Open	SGupta-WMF	T362551 Commons Impact AQS: endpoints with unit tests
Open	Milimetric	T362552 Commons Impact AQS: integration tests and deployment for endpoints
Open	None	T358719 [Commons Impact Metrics] Backfill datasets in Iceberg and Cassandra/Druid
Open	None	T358720 [Commons Impact Metrics] Create documentation of the main pipeline
Open	None	T358722 [Commons Impact Metrics] Create API documentation
Resolved	BTullis	T360531 [Commons Impact Metrics] Refactor/create helm config for AQS service accessing both Cassandra and Druid
Open	None	T360640 [Commons Impact Metrics] Create Airflow job that formats and loads the data to Druid for AQS