Page MenuHomePhabricator

Create Daily & Monthly pageview dump with country data and Visualize on UI
Closed, ResolvedPublic0 Story Points

Description

scope: deliver pageviews with geocoded data.

Need to supress data where there is not enough traffic, similar to what Erik Z does on stats.

Event Timeline

kevinator raised the priority of this task from to Needs Triage.
kevinator updated the task description. (Show Details)
kevinator added a subscriber: kevinator.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 25 2015, 7:06 PM

privacy implications must be very carefully considered here. The current reports that are created by Erik Z. take a lot of care to add fuzziness where it's needed.

Nuria added a subscriber: Nuria.Feb 25 2015, 7:24 PM

I think we need an example of how does the report look like. Are pageviews per page? Per project? Per language?

Yurik set Security to None.Feb 25 2015, 8:23 PM
Yurik added a subscriber: DFoy.
Yurik added a subscriber: Yurik.
Yurik added a comment.Feb 25 2015, 8:33 PM

Per T90499, for Zero, we would need a table with the following columns: language, subdomain (nothing|m|zero), site (wikipedia|...), country, count, bandwidth:

date        language subdomain site           page_views   content_size
2001-01-15  en       m         wikipedia.org  1000000      50000000
...

The last piece, content size, is the total user-downloaded traffic, including bits and multimedia. GeoTagging those requests probably relates to T89177 - tagging all traffic with zero= tag.

kevinator triaged this task as Normal priority.Feb 26 2015, 4:35 PM
kevinator updated the task description. (Show Details)
Nuria added a comment.Feb 27 2015, 8:35 PM

I am not sure we ill be able to provide content size. Doesn't seem like we would.

Please note that this format does not include country data, it includes "language" and that data is already available in the regular dumps:
date language subdomain site page_views
2001-01-15 en m wikipedia.org 1000000

Yurik added a comment.Feb 28 2015, 7:47 AM

@Nuria, why not? Content size should be fairly easy to obtain once tagging is enabled on all traffic - you simply run a summing groupby query without filtering by is_page_view, and join the result with the counting page-view-filtered query. MIght need to polish syntax here and add the break-up by subdomains, etc

select date, geo, cnt, size from
(select date, geo, count(*) cnt from quests where is_page_view group by date, geo) counts,
(select date, geo, sum(content) size from quests group by date, geo) sizes
OUTER JOIN counts.date = sizes.date AND counts. counts.geo = sizes.geo
Nuria added a comment.Mar 1 2015, 3:54 AM

Well, I did not know we haad any plans to tag all request data, first time I have heard about it. Once that is done it should be theoretically possible to get content size if this types of queries are performant enough.

Nuria moved this task from Wikistats Production to Dashiki on the Analytics board.Jun 12 2017, 4:06 PM
Nuria added a project: Analytics-Wikistats.
Nuria added a parent task: T130256: Wikistats 2.0..
Nuria moved this task from Dashiki to Backlog (Later) on the Analytics board.Jul 13 2017, 4:10 PM
Nuria renamed this task from Create Daily & Monthly pageview dump with country data to Create Daily & Monthly pageview dump with country data and Visualize on UI .Sep 8 2017, 9:19 PM
Nuria set the point value for this task to 0.
Nuria moved this task from Next Up to Parent Tasks on the Analytics-Kanban board.
Nuria edited projects, added Analytics; removed Analytics-Kanban.Mar 8 2018, 6:38 PM
Nuria moved this task from Backlog (Later) to Incoming on the Analytics board.
Nuria edited projects, added Analytics; removed Analytics-Kanban.
Nuria moved this task from Backlog (Later) to Incoming on the Analytics board.
Nuria edited projects, added Analytics-Kanban; removed Analytics.
Nuria moved this task from Next Up to Parent Tasks on the Analytics-Kanban board.Mar 21 2018, 5:40 AM
Nuria closed this task as Resolved.Jul 16 2018, 6:56 PM