Page MenuHomePhabricator

{wren} PV Aggregates
Closed, ResolvedPublic

Description

Objective: Aggregate Pageview counts into an intermediate easy/quick queryable state (Intermediate Aggregates)
Key Result: Intermediate aggregates are used to generate data to the Pageview API and other useful cubes. Data is available starting in May.

Story: someone taking on an analyst role runs an SQL query to get pageview numbers for Executives, FR or Communications.

For example: "What were the monthly pageviews in France, excluding spiders?"

2 dataset are produced:

Backfill Data for April
T96067: Compute pageviews aggregates daily and monthly from April {wren}
Tasks to setup Impala (serving layer on the cluster that can be queried)
T96328: setup 'testing' dataset on hive for Impala {wren} [13 pts]
T96329: Install Impala on cluster {wren}
T96330: test performance of Impala {wren} [8 pts]
T96331: Productionize Impala {hawk}

Event Timeline

kevinator raised the priority of this task from to Medium.
kevinator updated the task description. (Show Details)
kevinator added a subscriber: kevinator.
kevinator renamed this task from {epic} Analyst runs query to get aggregated pageview counts {crow} to {epic} Analyst runs query to get aggregated pageview counts {wren}.Apr 17 2015, 12:55 AM
kevinator renamed this task from {epic} Analyst runs query to get aggregated pageview counts {wren} to {wren} Intermediate PV Aggregates.Jun 12 2015, 6:36 AM
kevinator updated the task description. (Show Details)
kevinator renamed this task from {wren} Intermediate PV Aggregates to {wren} PV Aggregates.Jun 12 2015, 7:36 AM
kevinator updated the task description. (Show Details)

removing task T96314 as blocking task because it turns out we can use Hive for needs and it performs pretty fast. Impalla is not a blocker to this project.

kevinator claimed this task.