While investigating the parent task and all the new things added in future versions of memcached, I reviewed a bit also the current evictions that we are getting in our slabs. For example, let's pick mc1019:
[[ https://grafana.wikimedia.org/d/000000317/memcache-slabs?panelId=30&fullscreen&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=memcached&var-instance=mc1019&var-slab=All | Grafana ]]
Most of the evictions seems to be for slabs 18 and 31, let's pick 31 and check two graphs:
The items stored are really a lot (very small ones, the slab size is ~400B) but it is also clear that the evictions are happening due to missing space on the slab. This is also true for slab 18, and probably others. In future versions of memcached a thread will be dedicated to clean up expired/etc.. keys periodically (to help free some slab space), but sadly in our version everything is very static (we could in theory move 1M pages from one slab to the other manually but I wouldn't trust that functionality).
We are currently limiting memcached's memory usage with `-m 89088`, but as far as I can see from the mc10XX host graphs there is a ton of free ram (not even used for page cache) that we could in theory dedicate to memcached as experiment, to see if adding say +10G worth of slabs could reduce evictions and possibly improve get hit ratio.