Page MenuHomePhabricator

User History: Solve the fixed-point algorithm's long tail problem
Closed, ResolvedPublic5 Estimated Story Points

Description

The fixed-point algorithm that reconstructs the user history has problems dealing with users that have a very long user history (> 300 changes). Because of several reasons, the scala-spark program gets slower and slower with each iteration to the fixed-point algorithm. The vast majority (+99.99%) of users have a history of less than 100 events, but if we could solve this issue, we could generate the history for *all* users. We can take advantage of the fact that the long tail contains a small amount of data, that can be processed in a single machine without the need of RDDs, and the corresponding repartitioning, checkpointing, etc.