Page MenuHomePhabricator

Research spike: load enwiki data into Druid to study lookup table performance
Closed, DuplicatePublic21 Estimated Story Points

Description

Research spike: interested in lookup table performance and re-loading lookup tables with newly computed values. The challenge with the enwiki dataset is size, we estimate billions of values in the lookup tables.

Questions:

  • can we update lookup tables quickly enough (between the loading of each segment in Druid)
  • can we query Druid while joining to one or more lookup tables

If not, look into: https://clickhouse.yandex/

Desired outcome is some strategy to deal with these properties that change in the future. Valid strategy would be: don't do it and just compute it live every time you need it.

Event Timeline

Milimetric renamed this task from Research spike: load enwiki data into Druid to study whether we need lookup tables to Research spike: load enwiki data into Druid to study lookup table performance.Jul 28 2016, 5:28 PM
Milimetric updated the task description. (Show Details)
Milimetric set the point value for this task to 21.