Page MenuHomePhabricator

ETL pipeline for campaigns-product key metrics: Destination tables schemas
Closed, DeclinedPublic

Description

Queries & HQL files can be found on GitLab

Baseline data at: Campaign Metrics GSheets

Table Name: campaign_product_event_registration_activity_daily
schema:


`day_date` date COMMENT 'The day for which the metric is computed over.',
`n_participants_registered` int COMMENT 'Number of distinct participants registered in a day.',
`n_new_events` int COMMENT 'Number of new events registered in a day.',
`n_new_participants_registered` int COMMENT 'Number distinct participants registered in a day who created an account in the last 30 days.',

Table Name: campaign_product_event_registration_people_monthly
schema:

`month` int COMMENT 'The month for which the metric is computed over.',
`year` int COMMENT 'year for month over which the metric is computer over',
`n_organizers` int COMMENT 'Number of distinct organizers organizing new events created in month..',
`n_participants_registered` int COMMENT 'Number of distinct participants registered in a month.',

How Related Are the Metrics? Because they are frequently analyzed together and form a logical grouping, a single table makes more sense. And, at the current scale, performance and scalability are not concerns.

Notes:

  • Process reference ticket: T368352
  • Heads up, when T366765 is completed and wiki_db support is realized, they team will be interested to add a wiki_db field to this table and may proceed with a new request to add this field.
  • As to new accounts defining see T329382

Event Timeline

Iflorez renamed this task from ETL pipeline for campaigns-product key metrics table to ETL pipeline for campaigns-product key metrics: Destination table schema.Sep 13 2024, 9:42 PM
Iflorez renamed this task from ETL pipeline for campaigns-product key metrics: Destination table schema to ETL pipeline for campaigns-product key metrics: Destination tables schemas.Sep 17 2024, 9:24 PM
Iflorez updated the task description. (Show Details)

@ifried the organizer metric needs more refining:
We are clear that we want to count distinct organizers and we see the benefits of calculating this number over a month period:
n_organizers int COMMENT 'Number of distinct organizers in a month.',

What exactly within there should we count?
count of distinct organizers that had an active event in month? > for example, month x saw n events and those correspond to m count of organizers?

Iflorez updated the task description. (Show Details)

From my conversation with @ifried today:

n_organizers int COMMENT 'Number of distinct organizers organizing new events created in month.',

if there's time also:
n_new_organizers int COMMENT 'Number of distinct new organizers organizing new events created in month.',
we'll check in on progress at our 1:1 next week.