Page MenuHomePhabricator

Compute pageviews aggregates daily and monthly from April {wren}
Closed, DuplicatePublic

Description

Aggregate on daily and monthly granularity.
Only consider pageviews (is_pageview flag)
Dimensions of aggregation are:

  • country
  • access_method
  • project
  • language
  • Spider or not

Event Timeline

JAllemandou claimed this task.
JAllemandou raised the priority of this task from to Needs Triage.
JAllemandou updated the task description. (Show Details)
JAllemandou added a subscriber: JAllemandou.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 14 2015, 7:13 PM

Hi @JAllemandou. Please associate at least one project with this task, otherwise nobody can find this task when searching in the corresponding project(s). Thanks.

JAllemandou set Security to None.

Thanks for the reminder Aklapper !
I usually add projects, and forgot this time :)

kevinator renamed this task from Compute pageviews aggregates daily and monthly from April to Compute pageviews aggregates daily and monthly from April {crow}.Apr 16 2015, 12:23 AM

Kevin: here is an example of data (flat file) we would get as a result :https://hue.wikimedia.org/filebrowser/view//user/joal/pageviews_aggregated_2015-04-13.tsv
It took 25 minutes to compute (one day aggregation).

kevinator renamed this task from Compute pageviews aggregates daily and monthly from April {crow} to Compute pageviews aggregates daily and monthly from April {wren}.Apr 17 2015, 12:54 AM
kevinator triaged this task as Normal priority.Apr 17 2015, 12:58 AM
kevinator moved this task from Next Up to Tasked_Hidden on the Analytics-Kanban board.

Really wonderful that this is happening. I'm very interested in regularly reviewing pageview numbers by country and device. I'm wondering how I'll be able to access this information. Will there be a monthly report posted somewhere? Thanks for the info!

Checking back on my question about how to access this info. Any update?

Really wonderful that this is happening. I'm very interested in regularly reviewing pageview numbers by country and device. I'm wondering how I'll be able to access this information. Will there be a monthly report posted somewhere? Thanks for the info!

@MeganHernandez_WMF This data is currently being generated in a not yet productionized fashion.
Data from April 1st to May 25th is accessible using Hive, through Hue for instance (have you use those tools before ?).
The table is under my personal database joal (this will change when in production):

col_name	data_type	comment
year                	int                 	year of pageviews   
month               	int                 	month of pageviews  
day                 	int                 	day of pageviews    
hour                	int                 	hour of pageviews   
project             	string              	Project name, computed out of requests hostname
access_method       	string              	Method used to access the pages, can be desktop, mobile web, or mobile app
agent_type          	string              	Agent accessing the pages, can be spider or user
country             	string              	Country name of the accessing agents (computed using maxmind GeoIP database)
country_code        	string              	Country iso code of the accessing agents (computed using maxmind GeoIP database)
count               	bigint              	number of pageviews

I you want daily pageviews per country for users only, here is the request:

SELECT
  year, month, day, country, SUM(count) as nb_pageviews
FROM
  joal.pageviews_hourly
WHERE agent_type = 'user'
GROUP BY
  year, month, day, country