The fixed-point algorithm that reconstructs the user history has problems dealing with users that have a very long user history (> 300 changes). Because of several reasons, the scala-spark program gets slower and slower with each iteration to the fixed-point algorithm. The vast majority (+99.99%) of users have a history of less than 100 events, but if we could solve this issue, we could generate the history for *all* users. We can take advantage of the fact that the long tail contains a small amount of data, that can be processed in a single machine without the need of RDDs, and the corresponding repartitioning, checkpointing, etc.
Description
Description
Event Timeline
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJul 8 2016, 1:32 PM2016-07-08 13:32:19 (UTC+0)
JAllemandou edited projects, added Analytics-Kanban; removed Analytics.Jul 8 2016, 1:42 PM2016-07-08 13:42:00 (UTC+0)
JAllemandou moved this task from In Progress to In Code Review on the Analytics-Kanban board.Jul 19 2016, 4:12 PM2016-07-19 16:12:25 (UTC+0)
JAllemandou moved this task from In Code Review to Done on the Analytics-Kanban board.Jul 29 2016, 4:12 PM2016-07-29 16:12:11 (UTC+0)