Research spike: interested in lookup table performance and re-loading lookup tables with newly computed values. The challenge with the enwiki dataset is size, we estimate billions of values in the lookup tables.
Questions:
- can we update lookup tables quickly enough (between the loading of each segment in Druid)
- can we query Druid while joining to one or more lookup tables
If not, look into: https://clickhouse.yandex/
Desired outcome is some strategy to deal with these properties that change in the future. Valid strategy would be: don't do it and just compute it live every time you need it.