In T352885#9955405, @Krinkle wrote (5 July 2024):We discussed this in the MwEng-SvcOps meeting (13 June 2024). Extstore was enabled on a few hosts in the DC. Some stats differered, but we weren't able to come up with a testing strategy to prove or disprove an observable benefit from the MW application in this state since keys are sharded across all hosts, and traffic generally involves many different keys.
We noticed that the given hosts, contrary to my own expectations, there is a continuous non-zero trickle of evictions of tiny keys (e.g. < 200 bytes). This is surprising, because during research for T278392 and T336004, we @tstarling and I found that these slabs were not under pressure, and emperical testing showed that these values reliably persisted for at least a minute in practice. This is worrying, because we've since migrated to mcrouter-primary-dc for short-lived auth/nonce tokens, Rdbms-ChronologyProtector positions, and rate limiter counters.
The good news, and the reason we noticed, is that with extstore enabled, much more space is given to the tiny value "slab 4", and there is a flat line of zero undue evictions since.
Aside from this little happy accident, we found no per-host stats that are reason for concern. I agreed with Effie that rolling it to fully to the primary DC with the secondary as A/B-esque control would be a fine next step. We expect the worst outcome to be "no effect". And hope that there is enough similarity between the DCs traffic, yet enough separation in our data collection to notice an improvement here (e.g. via MW statds->prometheus stats that we have per-DC, using WANCache as way to measure cache hit ratio on meaningful keys; as well as Apache-level latency numbers such as the Appserver RED dashboards).
In T352885#9986523, @jijiki wrote (16 Jul 2024):@Krinkle Extstore is fully enabled on eqiad