Page MenuHomePhabricator

User history in hadoop
Closed, DuplicatePublic0 Estimated Story Points

Description

Parent task for the extraction transformation and loading of the user history data from mediawiki into hadoop.

Event Timeline

mforns renamed this task from Edit data schemas for anaylitcs to Create edit data schemas for anaylitcs.May 12 2016, 4:44 PM
mforns renamed this task from Create edit data schemas for anaylitcs to Create edit data hadoop/druid schemas for anaylitcs.

Build a schema that would help us doing analytics. How do we represent that data into druid? and before how do we represent that data into hadoop?

This task is about Schema Design, 1st stab, we might need to revisit this schema later.

We are going to treat it like a spike and devote 1 week for 1 person.

1.1 First team needs to internally define schemas that are to be used to calculate metrics. These are not event-based schema but data flowing in them comes from eventbus event-based data inflow.

1.2 How is this data represented in hadoop? Are analytics schema tables or something else.

1.3. How is this data represented on Druid? (we need to know how druid handles slowly-changing dimensions. See subtask)

There are three entities: Page, User and Revision.

Nuria set the point value for this task to 13.May 12 2016, 5:19 PM
mforns renamed this task from Create edit data hadoop/druid schemas for anaylitcs to User history in hadoop.Jun 28 2016, 3:59 PM
mforns triaged this task as Medium priority.
mforns updated the task description. (Show Details)
mforns changed the point value for this task from 13 to 0.