Prior to moving to production, we should benchmark (and document) the performance of the session storage service under mixed workloads. This information should prove invaluable in validating the implementation, providing a baseline for any subsequent optimizations, and in future capacity planning.
NOTE: [[ https://grafana.wikimedia.org/dashboard/db/redis?orgId=1&from=now-24h&to=now&var-datasource=codfw%20prometheus%2Fops&var-job=redis_sessions | Current estimations ]] put production request rates (per-datcenter) at ~30k reads/sec, ~100 writes/sec, and total data set size at ~3G (uncompressed).
----
## Environment
* Cluster
- sessionstore1001
- sessionstore1002
- sessionstore1003
- sessionstore2001
- sessionstore2002
- sessionstore2003
Each machine is a dual Intel Xeon Silver 4110 2.1G (8C, 16T) w/ 64G RAM, 2 @ 128G SSDs, and 1 gbit NIC.
NOTE: While this is the //production// cluster, it is otherwise not being utilized at the time of testing.
Every node runs a single instance of Cassandra 3.11.2 (6 node cluster). All Cassandra data (commitlog, sstables, etc) shares a RAID-1.
Kask is run from sessionstore1001 on port 8080.
`wrk` is executed from sessionstore1002.
| Threads | Concurrency | Size (k/v) | Ratio (r/w) | Throughput | 50p latency | 99p latency | Errors |
| --- | --- | --- | --- | --- | --- | --- | --- |
| 8 | 1024 | 8/16 | 100:1 | 50365/s | 22.78ms | 45.43ms | 0 |
| 8 | 2048 | 8/16 | 100:1 | 71899/s | 37.94 | 245.18ms | 620 (0.001%) |
| 8 | 1024 | 32/128 | 100:1 | | | | |
| 8 | 2048 | 32/128 | 100:1 | | | | |
| 8 | 1024 | 32/2048 | 100:1 | | | | |
| 8 | 2048 | 32/2048 | 100:1 | | | | |