Change Details

Prior to moving to production, we should benchmark (and document) the performance of the session storage service under mixed workloads. This information should prove invaluable in validating the implementation, providing a baseline for any subsequent optimizations, and in future capacity planning. NOTE: [[ https://grafana.wikimedia.org/dashboard/db/redis?orgId=1&from=now-24h&to=now&var-datasource=codfw%20prometheus%2Fops&var-job=redis_sessions | Current estimations ]] put production request rates (per-datcenter) at ~30k reads/sec, ~100 writes/sec, and total data set size at ~3G (uncompressed). ---- ## Environment * Cluster - sessionstore1001 - sessionstore1002 - sessionstore1003 - sessionstore2001 - sessionstore2002 - sessionstore2003 Each machine is a dual Intel Xeon Silver 4110 2.1G (8C, 16T) w/ 64G RAM, 2 @ 128G SSDs, and 1 gbit NIC. NOTE: While this is the //production// cluster, it is otherwise not being utilized at the time of testing. Every node runs a single instance of Cassandra 3.11.2 (6 node cluster). All Cassandra data (commitlog, sstables, etc) shares a RAID-1. Kask is run from screen session on sessionstore1001, port 8080. `wrk` is executed from sessionstore1002. [[ https://phabricator.wikimedia.org/P8434 | A Lua script ]] is used to create a randomized mixed workload from a pregenerated JSON-formatted file: ```lang=shell-session $ wrk --latency -t8 -c2048 -d10m -s multi-request-json.lua http://sessionstore1001.eqiad.wmnet:8080 ... ``` ## Results | Threads | Concurrency | Size (k/v) | Ratio (r/w) | Throughput | 50p latency | 99p latency | Errors | | --- | --- | --- | --- | --- | --- | --- | --- | | 8 | 1024 | 8/16 | 100:1 | 52610/s | 20.76ms | 39.01ms | 0 | | 8 | 2048 | 8/16 | 100:1 | 71899/s | 37.94 | 245.18ms | 620 (0.001%) | | 8 | 1024 | 32/128 | 100:1 | 52343/s | 21.75ms | 40.50ms | 0 | | 8 | 2048 | 32/128 | 100:1 | 67877/s | 38.33ms | 228.67ms | 2160 (0.005%) | | 8 | 1024 | 32/2048 | 100:1 | | | | | | 8 | 2048 | 32/2048 | 100:1 | | | | |