Page MenuHomePhabricator

[Vector Search] Estimate resource consumption at scale
Open, Needs TriagePublic8 Estimated Story Points

Description

To better understand the resource consumption of vector search, we would like to run some tests:

  • increasing QPS (after warmup) over 20-30 min sweeping from 5 to 100 QPS while measuring percentiles for CPU/mem/disk/GC
  • constant QPS (after warmup) over 20-30 min increase parallelism while measuring percentiles for CPU/mem/disk/GC
  • constant QPS (after warmup) over 20-30 min against different index sizes while measuring percentiles for CPU/mem/disk/GC

The assumption is, that p99 reveal knee points.

To be discussed:

  • We need a pool of queries to, maybe bucketed by complexity. Should we source the logs? Alternatively, we can reuse the ~1.4k queries sampled by Research for the Golden Set.
  • If possible we should also monitor the quality of results, but maybe that's hard without a golden set ready to use.

Details

Event Timeline

pfischer updated the task description. (Show Details)

First tests with the full frwiki semantic search dataset showed high latency and significant ceph IO at ~4GB/sec. This appears to be a problem with readahead on the ceph-backed storage system. It defaults to 8MB which is far too much for the random-access nature of knn search.

We've evaluated significantly reduced readahead of 64kb and found reasonable performance, but we've only tested that so far on single-shard queries. Our existing k8s deployments don't have the ability for us to set readaheads from the deployment side, we've so far relied on @bking manually applying the settings from the host machine to support this evaluation. In T418776 dpe-sre is looking into how to best support this change.

Approximate results from single-shard testing. Note that 1:1 means the entire index fits in memory and skips ceph, deploying that for the full test would require over 1TB of memory from dpe-sre-k8s and should be avoided if possible.

descriptionreq/secp50 msp99 msindex:memory
1 user, 8mb ra15602801:1
1 user, 8mb ra4.52404402:1
1 user, 64kb ra81303402:1
descriptionreq/secp50 msp99 msindex:memory
10 users, 8mb ra352804801:1
10 users, 8mb ra1470012002:1
10 users, 64kb ra322805002:1

Change #1249382 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/deployment-charts@master] semanticsearch: Increase heap by 1gb

https://gerrit.wikimedia.org/r/1249382

index/memory ratio has been a bit vague, to be more concrete:

index sizedisk cache sizegeneralized ratioactual ratio
13171:10.75:1
27172:11.6:1
40173:12.4:1

Further evaluation. This tests disabling the readaheads completely, along with evaluating less memory for the same indexes.

This demonstrates that completely removing readahead is a bad idea in this context. Not captured in the table is that warmup takes significantly longer (~10 min to stabilize, vs ~2 minutes), and that cross-node index transfers are also perhaps 2x-4x slower than with a minimal readahead (not measured, but observed). This test went up to 14 concurrents as i was trying to see if it could get the full 35qps available from the cpus, but with 3:1 memory and no readaheads it wasn't getting there.

descriptionreq/secp50msp99msindex:memory
1 user, 64kb ra81303402:1
1 user, no ra4.51804202:1
1 user, no ra3.52605003:1
descriptionreq/secp50msp99msindex:memory
10 users, 64kb ra322805002:1
10 users, no ra352704502:1
10 users, no ra283506003:1
descriptionreq/secp50msp99msindex:memory
14 users, no ra353805302:1
14 users no ra304507503:1

We then updated to opensearch 3.5.0. This was not expected to, but seems to have brought with it some significant perf improvements. An alternate idea might be that something with the embeddings predictions changes, we don't have the visibility to be sure. This led to hitting circuit breakers for heap memory, rejecting requests instead of queueing them.

Instances were updated with +25% heap memory (5g vs 4g) and that seems to have gone away:

descriptionreq/secp50msp99msindex:memory
1 user, 64kb ra91102203:1
10 users, 64kb ra701502103:1

This is looking to be in a pretty good space now, the reduced memory usage is promissing for the full rollout.

As for resource requirements, from david's initial review we have an expected size, with 3 copies of each shard, of 1073 gb. From this testing I think we can manage with around 450G of disk cache. It seems reasonable to round that up to 512G to leave space for growth. The exact amount of heap is uncertain, but the current 5G will most likely work, and 6G should be more than sufficient. We are expecting to use 15 shards for enwiki, it seems like 16 nodes is then a viable minimum number we should aim for. We could potentially go with 24 or even 32 nodes, but i think i would lean towards fewer nodes if possible.

nodesheapdisk cacheper node memorycluster memory
166g32g38g608g
246g21g27g648g
325g16g21g672g

For each set of nodes we would need varying disk per-instance. We estimate the total size is 1073g, and we need another 700g to have space for reindexing. Opensearch requires us to keep the disk < 70% full. Rounding up to 1900gb to leave a bit of room, we get to 2715gb required.

nodesper node diskcluster disk
16170g2720gb
24113g2712gb
3285g2720gb

Change #1249382 abandoned by Ebernhardson:

[operations/deployment-charts@master] semanticsearch: Increase heap by 1gb

Reason:

pod sizing has been separately addressed

https://gerrit.wikimedia.org/r/1249382