Run ETL for wmf_raw.ActionApi into wmf.action_* aggregate tables
Open, NormalPublic

Description

In T116065: Design aggregate tables to drive Action API reports I designed a series of dimensional rollup tables to make reporting on various interesting Action API metrics easier. I have been running an ETL process using cron, bash, Python, and hive on stat1002 since April 2016 that populates tables for that schema in the bd808 database. This workflow needs to be converted to use standard Analytics tools and move to the wmf database.

Note: now running on stat1005 due to stat1002 decomm.

bd808 created this task.Jun 8 2016, 3:56 PM
Restricted Application added a project: User-bd808. · View Herald TranscriptJun 8 2016, 3:56 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript
bd808 added a comment.Jun 8 2016, 5:41 PM

The scripts I have been using are available at https://github.com/bd808/action-api-analytics

Moving to radar for now, but when you prioritize and define this, we can help code the Oozie jobs that would get this done. Just let us konw.

Milimetric moved this task from Incoming to Radar on the Analytics board.Jul 7 2016, 5:38 PM
bd808 added a comment.Jul 7 2016, 5:42 PM

Moving to radar for now, but when you prioritize and define this, we can help code the Oozie jobs that would get this done. Just let us konw.

I can make time to work on it whenever someone has time to help me. I can probably knock out most of it with just a few pointers to wiki pages and/or gerrit commits that show doing something similar. The tasks that get run are really just /usr/bin/hive with proper year, month, day, hour values supplied as cli args.

oh, sweet, then it should be pretty straightforward. Grab me in IRC or a hangout whenever we're both free. As far as examples, all of our oozie code is here (but I will help you navigate it):

https://github.com/wikimedia/analytics-refinery/tree/master/oozie

And all of our oozie documentation is here (warning: XML is present in concentrations that may burn your eyes):

https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Oozie

bd808 triaged this task as Normal priority.Jul 22 2016, 5:05 PM
bd808 added a comment.Sep 30 2016, 1:15 AM

@Tgr do you have the time and energy to take this task over and finally finish our team goal from Q3 2015/16?

bd808 reassigned this task from bd808 to Tgr.Sep 30 2016, 10:16 PM

I think I have successfully conned convinced @Tgr to take this on when he gets some time.

Same offer to help with oozie applies to @Tgr, of course, and bonus: I'm now a lot better at setting up oozie jobs.

Change 331100 had a related patch set uploaded (by Gergő Tisza):
[WIP] Add Oozie jobs for wmf_raw.ApiAction -> wmf.action_*

https://gerrit.wikimedia.org/r/331100

@Tgr I'm aiming to review this by the end of the week, please ping me if I slip up.