Page MenuHomePhabricator

Eng uses Mahout installed on Hadoop cluster
Closed, ResolvedPublic

Description

Ellery is working on the "Read More" feature and needs Mahout on the cluster to generate recommendations.

Event Timeline

kevinator assigned this task to Ottomata.
kevinator raised the priority of this task from to High.
kevinator updated the task description. (Show Details)
kevinator changed Security from none to None.
kevinator subscribed.

I'm not sure how to interpret priorities. For context, this week I am fine working with Mahout locally and playing with toy data sets. But it would be great to take a crack at building recommendations using all of english wikipedia next week, which won't be possible on my machine.

Using all revisions, or just current page text?

For now I'm just using current links and (url, referer) pairs from wmf_raw.webrequest.

I will never need the revisions.

Ottomata mentioned this in Unknown Object (Diffusion Commit).Dec 11 2014, 4:06 PM

Ok, I have 0 experience with mahout, but from what I can tell, it is just an executable that needs to be installed on client nodes, i.e. stat1002.

DONE! Let me know if there is more! :)

I will puppetize this for CDH 5.2 when it happens.

Pretty sure this is resolved! Feel free to reopen if there is more ( that was too easy!)

Thanks Andrew! That was speedy. I just ran my first mahout job in hadoop.