Page MenuHomePhabricator

Build out hadoop job to calculate average page views over time for cirrussearch scoring purposes
Closed, ResolvedPublic

Details

Event Timeline

EBernhardson raised the priority of this task from to Needs Triage.
EBernhardson updated the task description. (Show Details)
EBernhardson added a project: CirrusSearch.
EBernhardson subscribed.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Deskana subscribed.

I started building this by putting together a hive query, but after playing with it a bit I don't think it's going to be flexible enough. Oozie can kick of spark jobs, so working that out now. Spark can be done with the java, scala or python bindings. Not sure which to use, but the docs seem to lean towards scala so will work with that initially.

actually since stas started his part in python, it seems to make the most sense to continue and do all our spark work in python unless a specific need arises to diverge.

Change 256167 had a related patch set uploaded (by EBernhardson):
Initial popularity score calculator

https://gerrit.wikimedia.org/r/256167

Change 256167 merged by Smalyshev:
Initial popularity score calculator

https://gerrit.wikimedia.org/r/256167