Page MenuHomePhabricator

Create Hadoop Job to load data into cassandra [34 pts] {slug}
Closed, ResolvedPublic

Description

Using cassandra as the current storage backend for our pageview API, we need to load data into it.
Data being naturally stored into hadoop, let's load cassandra from hadoop directly.

Event Timeline

JAllemandou claimed this task.
JAllemandou raised the priority of this task from to Normal.
JAllemandou updated the task description. (Show Details)
JAllemandou added a project: Analytics-Kanban.
JAllemandou added a subscriber: JAllemandou.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 6 2015, 11:36 AM
Milimetric renamed this task from Create Hadoop Job to load data into cassandra [?pts] {slug} to Create Hadoop Job to load data into cassandra [21?? pts] {slug}.Aug 10 2015, 5:18 PM
Milimetric set Security to None.

I have tried to test https://github.com/spotify/hdfs2cass.
It is not directly usable for two reasons:

  • I doesn't handle username/password authentication for the Cassandra cluster
  • It doesn't handle inserting in tables with compound partition key.

I'll stick with the hand-written job I have submitted, but why not patch their code base later to include the two features describe before.

Change 236224 had a related patch set uploaded (by Joal):
[WIP] Add cassandra load job for pageview API

https://gerrit.wikimedia.org/r/236224

Milimetric renamed this task from Create Hadoop Job to load data into cassandra [21?? pts] {slug} to Create Hadoop Job to load data into cassandra [21 pts] {slug}.Sep 14 2015, 4:11 PM
ggellerman renamed this task from Create Hadoop Job to load data into cassandra [21 pts] {slug} to Create Hadoop Job to load data into cassandra [34 pts] {slug}.Oct 14 2015, 4:15 PM
kevinator closed this task as Resolved.Oct 15 2015, 4:04 PM
kevinator added a subscriber: kevinator.

Change 236224 merged by Joal:
Add cassandra load job for pageview API

https://gerrit.wikimedia.org/r/236224