Page MenuHomePhabricator

Create intermediate dataset: pageview with actor information
Closed, ResolvedPublic

Description

Create intermediate dataset: pageview + is_redirect_to_pageview with actor information for easier computation of unique data (so we do not read a whole month of webrequest).
This dataset should also be usefull for other queries, as it would contain an actor-fingerprint id, and most fields from webrequest that currently are removed from pageview.

How do we call it?

  • pageview_actor_hourly
  • webrequest_pageview_actor

Event Timeline

Nuria created this task.Jun 15 2020, 6:17 PM
Nuria updated the task description. (Show Details)Jun 15 2020, 6:19 PM
JAllemandou updated the task description. (Show Details)Jun 17 2020, 6:29 AM
JAllemandou updated the task description. (Show Details)Jun 17 2020, 8:30 AM

I'm struggling with the name: The table will contain pageview AND redirect-to-pageview, so it's not not only pageview. Plus, it's a lot more similar to webrequest than to pageview...
Maybe webrequest_pageview_actor, and storing table and job stuff close to webrequerst instead of pageview?

Change 606127 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] [WIP] Add pageview_actor_hourly table and oozie job

https://gerrit.wikimedia.org/r/606127

fdans triaged this task as High priority.Jun 18 2020, 3:56 PM
fdans moved this task from Incoming to Data Quality on the Analytics board.

Change 606127 merged by Joal:
[analytics/refinery@master] Add pageview_actor_hourly table and oozie job

https://gerrit.wikimedia.org/r/606127

Change 607719 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Correct pageview_actor_hourly bug

https://gerrit.wikimedia.org/r/607719

Change 607719 merged by Joal:
[analytics/refinery@master] Correct pageview_actor_hourly bug

https://gerrit.wikimedia.org/r/607719

JAllemandou set Final Story Points to 3.Jun 30 2020, 9:51 AM
Nuria closed this task as Resolved.Jul 6 2020, 10:03 PM