To better understand the resource consumption of vector search, we would like to run some tests:
- increasing QPS (after warmup) over 20-30 min sweeping from 5 to 100 QPS while measuring percentiles for CPU/mem/disk/GC
- constant QPS (after warmup) over 20-30 min increase parallelism while measuring percentiles for CPU/mem/disk/GC
- constant QPS (after warmup) over 20-30 min against different index sizes while measuring percentiles for CPU/mem/disk/GC
The assumption is, that p99 reveal knee points.
To be discussed:
- We need a pool of queries to, maybe bucketed by complexity. Should we source the logs? Alternatively, we can reuse the ~1.4k queries sampled by Research for the Golden Set.
- If possible we should also monitor the quality of results, but maybe that's hard without a golden set ready to use.