Page MenuHomePhabricator

Create intermediate dataset: pageview with actor information
Closed, ResolvedPublic

Description

Create intermediate dataset: pageview + is_redirect_to_pageview with actor information for easier computation of unique data (so we do not read a whole month of webrequest).
This dataset should also be usefull for other queries, as it would contain an actor-fingerprint id, and most fields from webrequest that currently are removed from pageview.

How do we call it?

  • pageview_actor_hourly
  • webrequest_pageview_actor

Event Timeline

I'm struggling with the name: The table will contain pageview AND redirect-to-pageview, so it's not not only pageview. Plus, it's a lot more similar to webrequest than to pageview...
Maybe webrequest_pageview_actor, and storing table and job stuff close to webrequerst instead of pageview?

Change 606127 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] [WIP] Add pageview_actor_hourly table and oozie job

https://gerrit.wikimedia.org/r/606127

fdans moved this task from Incoming to Data Quality on the Analytics board.

Change 606127 merged by Joal:
[analytics/refinery@master] Add pageview_actor_hourly table and oozie job

https://gerrit.wikimedia.org/r/606127

Change 607719 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Correct pageview_actor_hourly bug

https://gerrit.wikimedia.org/r/607719

Change 607719 merged by Joal:
[analytics/refinery@master] Correct pageview_actor_hourly bug

https://gerrit.wikimedia.org/r/607719