Page MenuHomePhabricator

[REQUEST] Add new Fundraising dimensions to druid.pageviews_daily & druid.pageviews_hourly
Open, LowPublic5 Estimated Story Points

Description

Name for main point of contact and contact preference
@JMando

What teams or departments is this for?
Fundraising Analytics

What are your goals? How will you use this data or analysis?

  • fiscal_year column for all the records in pageview hourly and daily tables
  • fr_campaign column which will be populated for the pageview records where the campaign was running.
    • For Example: pageviews from Nov 25 - Dec 31 from US, CA, IE, GB, NZ, AU will have a value like 'en6c Campaign' and will be limited to the language wiki we fundraised in for the campaign (like en.wikipedia for en6c).

What are the details of your request? Include relevant timelines or deadlines
Our team in FR Data has brought up for a while that due to permissions in the product instance of Superset, we are unable to add calculated metrics and columns to existing datasets such as druid.pageviews_daily and druid.pageviews_hourly. This capability would greatly improve the speed at which our team can accomplish a lot of our regular monitoring, such as monitoring and reporting on country-level pageview data by dimension

Is this request urgent or time sensitive?
No

Event Timeline

mpopov edited projects, added Product-Analytics (Kanban); removed Product-Analytics.

I have added this to my agenda with @kzimmerman to discuss, especially in the context of data stewardship.

mpopov added a subscriber: mpopov.
mpopov triaged this task as Low priority.Jul 5 2022, 5:06 PM
mpopov added subscribers: Mayakp.wiki, EChetty.

@kzimmerman will sync on this with @Mayakp.wiki

Hi @EYener, @EChetty and I spoke briefly about this and wanted to know the calculated metrics and columns that you would like to add to the existing pageview datasets.
We can decide the best approach once we get an idea of what FR is interested in. Thanks!

@JMando : could you perhaps provide the calculated metrics and columns that FR would like to add to the pageview datasets ?

I believe we wanted to be able to add fiscal year (FY1920, FY2021, FY2122, etc.) along with groupings for our fundraising campaigns using a combo of project and country_code (en6c, India, Japan, Bundle, etc.).

Fiscal Year example code (mysql syntax, but believe we can make something similar work too):

CASE
	WHEN MONTH(date) < 7 THEN CONCAT('FY', RIGHT(CAST(YEAR(date) - 1 AS CHAR), 2),
        RIGHT(CAST(YEAR(date) AS CHAR), 2))
    WHEN MONTH(date) >= 7 THEN CONCAT('FY', RIGHT(CAST(YEAR(date) AS CHAR), 2),
        RIGHT(CAST(YEAR(date) + 1 AS CHAR), 2))
    END AS fiscal_year

India, for example, would be:

project IN ('hi.wikipedia', 'en.wikipedia') AND country_code IN ('IN') THEN 'India Campaign'

Thanks @JMando .
To make sure I understand, something like

  1. fiscal_year column for all the records in both the tables
  2. fr_campaign column which will be populated for the pageview records where the campaign was running.
    • Example: pageviews from Nov 25 - Dec 31 from US, CA, IE, GB, NZ, AU will have a value like 'en6c Campaign'

That is right. With the addition that we would also want to limit to the language wiki we fundraised in for the campaign (like en.wikipedia for en6c).

Right! ok. I will discuss further with the team and get back to you.

Mayakp.wiki renamed this task from [REQUEST] Edit access: druid.pageviews_daily & druid.pageviews_hourly to [REQUEST] Add new Fundraising dimensions to druid.pageviews_daily & druid.pageviews_hourly.Sep 6 2022, 9:30 PM
Mayakp.wiki removed kzimmerman as the assignee of this task.
Mayakp.wiki moved this task from Triage to Tracking on the Product-Analytics board.
Mayakp.wiki added a project: Data-Engineering.
Mayakp.wiki removed a subscriber: EYener.

Per discussion in today's Board Refinement meeting moving this task to Tracking for Product Analytics as the scope of this task has been changed.
Assigning the task to data-engineering to fulfill the request.

EChetty set the point value for this task to 5.Nov 28 2022, 8:19 PM
EChetty moved this task from To be prioritised to Discussed & Estimated on the Data Pipelines board.