Page MenuHomePhabricator

Create ETL pipelines for campaigns-product baseline metrics
Closed, DeclinedPublic

Description

Data processing pipelines (Airflow) to be created for the following metrics:

MetricVisualization typeUpdate frequency
Distinct participants registered in periodLine graphDaily
Distinct participants registered in period w/accounts created in <previous 30daysLine graphDaily
Distinct New events in periodLine graphDaily
Distinct participants registeredLine graphMonthly
Distinct organizers organizing events created in periodLine graphMonthly

Notes:

  • Query: Distinct participants registered in period
  • Query: Distinct organizers that joined as organizers in period
  • Query: New events in period
  • Query: Number of new accounts created in period; MariaDB query is drafted
  • write create daily and monthly table hql files
  • update files to ensure that:
  • Tables that are aggregates should be suffixed with the period over which they were aggregated
  • columns: bigint (64-bit integers) should be the default for any integer columns
  • columns: wiki_id should be used for internal references to a particular 'Mediawiki database,
  • columns: month should always be an integer in the range 1-12
  • columns: year and month integer columns
  • draft dag files
  • draft job logic env given querying MariaDB (wikishared & central auth)
  • Write unit tests
  • Test files
  • End to end testing
  • MR into appropriate final locations

Event Timeline

Iflorez triaged this task as Medium priority.Sep 10 2024, 10:21 PM
Iflorez moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Iflorez updated the task description. (Show Details)
Iflorez updated the task description. (Show Details)
Iflorez moved this task from Backlog to Paused_campaigns on the User-Iflorez board.
Iflorez updated the task description. (Show Details)
Iflorez updated the task description. (Show Details)
Iflorez updated the task description. (Show Details)