Page MenuHomePhabricator

User history in hadoop
Closed, DuplicatePublic0 Story Points

Description

Parent task for the extraction transformation and loading of the user history data from mediawiki into hadoop.

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptMay 9 2016, 5:36 PM
mforns renamed this task from Edit data schemas for anaylitcs to Create edit data schemas for anaylitcs.May 12 2016, 4:44 PM
mforns renamed this task from Create edit data schemas for anaylitcs to Create edit data hadoop/druid schemas for anaylitcs.
Nuria added a subscriber: Nuria.May 12 2016, 4:45 PM

Build a schema that would help us doing analytics. How do we represent that data into druid? and before how do we represent that data into hadoop?

Nuria added a comment.EditedMay 12 2016, 5:01 PM

This task is about Schema Design, 1st stab, we might need to revisit this schema later.

We are going to treat it like a spike and devote 1 week for 1 person.

1.1 First team needs to internally define schemas that are to be used to calculate metrics. These are not event-based schema but data flowing in them comes from eventbus event-based data inflow.

1.2 How is this data represented in hadoop? Are analytics schema tables or something else.

1.3. How is this data represented on Druid? (we need to know how druid handles slowly-changing dimensions. See subtask)

Nuria added a comment.May 12 2016, 5:18 PM

There are three entities: Page, User and Revision.

Nuria set the point value for this task to 13.May 12 2016, 5:19 PM
mforns claimed this task.May 18 2016, 10:27 AM
mforns moved this task from Next Up to In Progress on the Analytics-Kanban board.
mforns renamed this task from Create edit data hadoop/druid schemas for anaylitcs to User history in hadoop.Jun 28 2016, 3:59 PM
mforns triaged this task as Normal priority.
mforns updated the task description. (Show Details)
mforns changed the point value for this task from 13 to 0.
mforns moved this task from In Progress to Paused on the Analytics-Kanban board.Jun 28 2016, 4:02 PM