Page MenuHomePhabricator

Research spike: load enwiki data into Druid to study lookup table performance
Closed, DuplicatePublic21 Story Points

Description

Research spike: interested in lookup table performance and re-loading lookup tables with newly computed values. The challenge with the enwiki dataset is size, we estimate billions of values in the lookup tables.

Questions:

  • can we update lookup tables quickly enough (between the loading of each segment in Druid)
  • can we query Druid while joining to one or more lookup tables

If not, look into: https://clickhouse.yandex/

Desired outcome is some strategy to deal with these properties that change in the future. Valid strategy would be: don't do it and just compute it live every time you need it.

Event Timeline

Nuria created this task.Jul 27 2016, 7:15 PM
Milimetric renamed this task from Research spike: load enwiki data into Druid to study whether we need lookup tables to Research spike: load enwiki data into Druid to study lookup table performance.Jul 28 2016, 5:28 PM
Milimetric updated the task description. (Show Details)
Milimetric set the point value for this task to 21.
Milimetric moved this task from Incoming to Dashiki on the Analytics board.
Milimetric moved this task from Dashiki to Backlog (Later) on the Analytics board.Sep 15 2016, 4:40 PM