Page MenuHomePhabricator

Consider raising Memcached MWObject cache memory size limit
Closed, DeclinedPublic

Description

While investigating the parent task and all the new things added in future versions of memcached, I reviewed a bit also the current evictions that we are getting in our slabs. For example, let's pick mc1019:

Grafana

Most of the evictions seems to be for slabs 18 and 31, let's pick 31 and check two graphs:


Grafana

Grafana

The items stored are really a lot (very small ones, the slab size is ~400B) but it is also clear that the evictions are happening due to missing space on the slab. This is also true for slab 18, and probably others. In future versions of memcached a thread will be dedicated to clean up expired/etc.. keys periodically (to help free some slab space), but sadly in our version everything is very static (we could in theory move 1M pages from one slab to the other manually but I wouldn't trust that functionality).

We are currently limiting memcached's memory usage with -m 89088, but as far as I can see from the mc10XX host graphs there is a ton of free ram (not even used for page cache) that we could in theory dedicate to memcached as experiment, to see if adding say +10G worth of slabs could reduce evictions and possibly improve get hit ratio.

Event Timeline

elukey triaged this task as Normal priority.Mar 6 2019, 8:17 AM
elukey created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 6 2019, 8:17 AM
elukey removed aaron as the assignee of this task.Mar 6 2019, 8:18 AM
elukey updated the task description. (Show Details)

Change 495731 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Raise memcached dedicated memory on mc1019

https://gerrit.wikimedia.org/r/495731

Change 495731 merged by Elukey:
[operations/puppet@production] Raise memcached dedicated memory on mc1019

https://gerrit.wikimedia.org/r/495731

Mentioned in SAL (#wikimedia-operations) [2019-03-12T08:57:23Z] <elukey> restart memcached on mc1019 to apply new settings - T217731

elukey moved this task from Backlog to In Progress on the User-Elukey board.Mar 12 2019, 3:12 PM
elukey claimed this task.Mar 13 2019, 7:34 AM
elukey added a comment.EditedMar 13 2019, 7:41 AM

Very interesting results for mc1019 after a day of metrics.

I can clearly see evictions flat zero, they usually start happening after a shard restart when the memcached slabs reach the maximum memory allocable. In this case, the extra 10G have had a positive effect. Items stored for mc1019 are now more than what they were before the restart, and growing. The hit ratio though is still a bit below what it was before the restart (at this time, 0.9352 vs 0.9482) but steady and slowly increasing. I hope to see it crossing the 0.95 mark eventually, but it might take days before completing. I expected a higher hit ratio sooner with evictions flat zero, but probably they were not that impacting as I thought.

Confirms what written above, but at the slab level. It seems that the get hit rate improved for some slabs, but overall the level seems to be what it was before the restart.

Let's see in a couple of days if anything changed :)

I was wrong, evictions started happening, even if on a lower pace. The extra 10G of space allowed mc1019 to store 53M objects rather than 46.8M, lowering down the evictions by 100 ops/s more or less. Interestingly the reclaim rate (that should be the expired objects cleaned up to allow more space in the slab) grew at the same time, so I suppose that keeping more things in the LRU eventually translates into having more expired items to evict. Not sure if this is the right way to read these graphs, will think more about it during the next days :)

The get hit ratio didn't improve as expected, so I'd say that our eviction rate is not problematic as I thought. I checked the objects on slab 17 (the biggest offender in terms of evictions) and they are all small objects expiring after few hours (mostly page-content-model and page-restrictions), it seems that allowing more of them in the slab did not bring positive effects overall.

I would like to leave the shard to run as it is for a couple of days more to spot any other improvement, but up to now I don't see a big reason to update the rest of the shards when the experiment is done.

aaron added a comment.EditedMar 13 2019, 2:49 PM

Can the (extra) space be dedicated more so towards the larger slabs, were we have more problems AFAIK?

elukey added a comment.EditedMar 14 2019, 8:30 AM

Can the (extra) space be dedicated more so towards the larger slabs, were we have more problems AFAIK?

I think that the only tunable that we could use, in this version of memcached, is the slab reassign functionality (that we already have it activated via -o slab_reassign). As far as I know the feature allows you to move 1MB pages from one slab to the other, to fine tune where needed. I am not sure how reliable this functionality is in our version, so I'd be reluctant to try it in prod (maybe in deployment-prep or codfw, could be a good test). In recent versions of memcached this setting evolved in the slab automover, which is basically a conservative heuristic to automatically reassign 1MB pages from one slab with zero evictions to a slab with a lot of evictions.

I created two graphs in https://grafana.wikimedia.org/d/000000317/memcache-slabs to check how much memory we waste:

https://grafana.wikimedia.org/d/000000317/memcache-slabs?panelId=66&fullscreen&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=memcached&var-instance=mc1019&var-slab=All&from=now-7d&to=now
https://grafana.wikimedia.org/d/000000317/memcache-slabs?panelId=65&fullscreen&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=memcached&var-instance=mc1019&var-slab=All&from=now-3h&to=now

The math used to calculate the space allocated but not used was taken from https://github.com/memcached/memcached/blob/master/doc/protocol.txt#L1031-L1035

It is very interesting to see that:

  • most of the shards have ~2.5G of memory allocated in some slab but not used
  • on mc1019, after bumping the total memory available, the total wasted went up by ~400MB too.

I keep thinking that memcached 1.5.x will give us a ton of relief and improvements, there are two things that in my opinion will benefit us:

  1. LRU Mainainer, that cleans up expired items periodically (as separate thread). Today I took a look, for example, to mc1022's slab 187 (1MB slab class!) since it was showing the oldest items in the shard. The majority of items were prepared-edits expired days before, but not yet reclaimed since no more requests for that slab class asked for memory (for example, new SETs). This is a problem that will automatically go away with 1.5.x
  2. automatic slab reassign, that coupled with the feature explained above could be an automatic way to move memory around where needed.

I have read again the https://github.com/memcached/memcached/blob/master/doc/protocol.txt and came up with some new graphs, all added to https://grafana.wikimedia.org/d/000000317/memcache-slabs.

The ~2.5G of memory wasted mentioned before is, IIUC, not the memory that could be used by other slabs, but memory allocated and not used due to the overhead between item stored and fixed chunk size (for example, bytes not usable due to item smaller than the chunk size). So this memory cannot be used/re-allocated, it just give us a good metric to understand how much overhead we have in our slab distribution. Useful also as comparison when we'll test new memcached versions.

One useful metric to know how many slabs/bytes could be reallocated by the slab reassign functionality is:
https://grafana.wikimedia.org/d/000000317/memcache-slabs?orgId=1&from=now-7d&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cluster=memcached&var-instance=mc1019&var-slab=All&panelId=66&fullscreen

It seems that some slabs could offer some 1MB slabs to others, since they are not using them. Not that we have to manually do it of course, but doing some tests will surely give us some ideas about the impact of the slab automover functionality in newer memcached versions.

I have taken a snapshot of the past week's eviction data to see how things changed after the raise in memcached available memory:

Considerations:

  • evictions (items evicted before expiry) did decrease
  • expired unfetched for slab 17/18 rose a lot, but I am not sure if it is a consequence of the restart (so empty memory filled with new data) or other factors. Will keep it monitored for other days.
elukey moved this task from In Progress to Stalled on the User-Elukey board.Mar 29 2019, 9:54 AM
elukey closed this task as Declined.Jun 14 2019, 10:02 AM

This is probably not worth pursuing for the moment, the config for mc1019 has been removed and its memory limits will be restored upon the next scheduled round of reboots.