Page MenuHomePhabricator

Create a plan for moving campaigns-product metrics data to airflow
Closed, DeclinedPublic

Description

Help the campaigns team plan for moving to Airflow. See the slide deck presentation. Once data pipelines are on Airflow, the team can rely on Superset for data dashboards.

  • Identify data categories available (data assets)
  • Identify data to potentially collect within the data assets
  • Test items in the potential collection list that have not been previously utilized in reporting/queries
  • Device data (T367840),
  • Mobile vs desktop (T359112),
  • answers visualization (out of scope due to complexity)
  • ambassador action output
  • Share the plan and work in progress
  • Cross team collaboration & planning
  • Rebecca Maung & Arina
  • Community team(s)
  • Review planning on future metrics T365292
  • Request adding special event pages to pageview whitelist T368303
  • Consider
  • test data from T365292 --> may be something to consider in upcoming quarters
  • test data from T365407 --> data now available on demand at Pageviews Analysis
  • Finalize metric list
  • Submit LS3C request as needed
  • Review methods to connect MariaDb directly with Airflow with KC per T362612, T362615, etc.
  • KC links compilation, Airflow notes
  • Technical planning
  • Resources: Assess the resources required, such as data sources, computational requirements, and external systems that will interact with the pipeline.
  • Workflow: Determine the tasks that need to be automated, their dependencies, and the order in which they should be executed.
  • https://wikitech.wikimedia.org/wiki/Data_Platform/Dataset_creation

WIP Planning Notes Document
Analytics/Systems/Superset
Superset

Event Timeline

Met with @ifried today and talked about the data assets. Here's the preliminary list of potential metrics to track moving forward (which we discussed today):

Registration Tool:

  • Distinct participants registered in period
  • Distinct organizers that joined as organizers in period
  • Number of new accounts created in period
  • New events in period
  • Mobile vs desktop [for engineering &/or design decisions] TESTING NEEDED
  • Devices [for QA purposes; may be satisfied as a preset notebook that QA engineer can run] TESTING NEEDED

Survey data:

  • Age, gender, confidence TESTING/COLAB NEEDED

Event creation data:

  • Event url & page creator name TESTING NEEDED

Event invitations: T365292
Event list:

  • Pageviews
    • To the main page (pageviews) TESTING NEEDED
    • To event pages (referral source tracking) TESTING NEEDED
Iflorez updated the task description. (Show Details)
Iflorez triaged this task as Medium priority.Jul 16 2024, 7:16 PM
Iflorez moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Iflorez updated the task description. (Show Details)

Met with @ifried yesterday and reviewed planning and status including data assets. Here's the current metrics we'll track moving forward:

Registration Tool:

Distinct participants registered in period
Distinct organizers that joined as organizers in period
Number of new accounts created in period
New events in period

Potentially data from T365292 and T365407 queries, assuming there's time to setup those pipelines and that the data is helpful for the team (testing|dissecting re: the latter is a necessary first step). When work on those two tickets is complete, and the data is available, I will pause the work on this ticket and will shift to completing those two tasks as a priority.