Page MenuHomePhabricator

[Migration] Pageview - Learning
Closed, ResolvedPublic9 Estimated Story Points

Description

Migrate oozie jobs for Pageview - Learning
This encompasses four oozie jobs:

  • oozie/learning/features/actor/hourly
  • oozie/learning/features/actor/rollup/hourly
  • oozie/learning/predictions/actor/hourly
  • oozie/pageview/actor

Also, the learning, features and predictions vocable were used when we were thinking that new "ML" style production jobs would land on the cluster. This has not proven true so far. Do we wish to change those names?

This task needs to be broken down in three:

  • Make UDFs work with Spark (multithreading)
  • Update HQL files for the jobs to make them spark compliant
  • Migrate the job to Airflow using the updated UDFs and HQL files

Event Timeline

JArguello-WMF set the point value for this task to 9.
JAllemandou renamed this task from [Migration] Learning to [Migration] Pageview - Learning.Dec 20 2022, 11:00 AM
JAllemandou updated the task description. (Show Details)
JAllemandou updated the task description. (Show Details)

Change 884359 had a related patch set uploaded (by Joal; author: Joal):

[analytics/refinery@master] Add hql/webrequest/actor folder and scripts

https://gerrit.wikimedia.org/r/884359

Change 887786 had a related patch set uploaded (by Joal; author: Joal):

[operations/puppet@production] Update analytics data purge for webrequest_actor

https://gerrit.wikimedia.org/r/887786

Change 884359 merged by Joal:

[analytics/refinery@master] Add webrequest and pageview actor scripts

https://gerrit.wikimedia.org/r/884359

Change 887786 merged by Nicolas Fraison:

[operations/puppet@production] Update analytics data purge for webrequest_actor

https://gerrit.wikimedia.org/r/887786

Change 888228 had a related patch set uploaded (by Joal; author: Joal):

[operations/puppet@production] Remove previously absent timers from analytics data_purge

https://gerrit.wikimedia.org/r/888228

Change 888228 merged by Nicolas Fraison:

[operations/puppet@production] Remove previously absent timers from analytics data_purge

https://gerrit.wikimedia.org/r/888228