Page MenuHomePhabricator

Revisit use of swap and related kernel settings
Open, MediumPublic

Description

We currently use a mix of setups without swap (caches, mw servers) and with swap (rest of the fleet).

We should deep-dive whether

  • this makes sense for the servers currently using swap
  • we avoid swap partitions in favour of swap files (if we retain them)
  • review our swap-related sysfs settings

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 21 2020, 11:59 AM
Marostegui triaged this task as Medium priority.Oct 21 2020, 11:59 AM
BBlack added a subscriber: BBlack.Oct 21 2020, 12:05 PM

Recording from IRC for posterity:

11:07 < bblack> so I was checking out https://gerrit.wikimedia.org/r/c/operations/puppet/+/633704 (which is one of the partman cleanup commits, this one 
                affecting our cacheproxy disk layout), and I've fallen back down the rabbithole of swap space considerations
11:07 < bblack> because the new defaults basically take the partman defaults without any specifics, which is apparently going to create a 1G swap partition 
                (at least, until some future change of defaults?)
11:08 < bblack> in the recent past we didn't have swap partitions on the cache boxes
11:08 < bblack> (and of course, modules/base for better or worse has vm.swappiness = 0, along with some other sometimes-questionable tunables)
11:10 < bblack> I get all the arguments for why disabling swap is probably a dumb idea in most common scenarios
11:11 < bblack> but, I think, if we're taking that angle as a reason to configure swap, we'd probably also relax that swapiness=0 setting as well to make it 
                more useful, and perhaps configure something other than a fixed 1G value for wildly varying workloads and phys memory sizes, too
11:11 < bblack> the current setup (yes, do swap, but at a small fixed sizes with swapiness=0) seems to be in some less-ideal intersection of competing ideas
11:12 < bblack> but in any case, rewinding to the cacheproxy case in particular
11:14 < bblack> these boxes have: 384GB of RAM, the bulk of which is hopefully a fast ram cache for http objects, and a 1.6TB super-fast nvme that's used as 
                an http object disk cache as well (it's the backing for the earlier ram cache), and then a mere ~300G of standard-issue (slower) SSDs for this 
                rootfs stuff
11:16 < bblack> we really don't have any hope of a substantially-useful swap config (where e.g. some mostly-idle or inefficient meta-daemons related to 
                monitoring or something might swap out significant RAM that matters to us), the disks we'd potentially use for swap are actually-smaller than 
                the real RAM, and we're very sensitive to the fact that we'd rather risk OOM than have the kernel make a dumb 
11:16 < bblack> decision on swap (e.g. algorithmically make the mistake of swapping out some critical varnishd cache memory and then need to swap it back in 
                during an cache hit for users)
11:17 < bblack> but I don't think our new regime of standard configs allows for a swapless setup?
11:23 < bblack> or: we could make some noswap variants so that it's semi-standardized?
11:24 < bblack> I don't want to derail standardization efforts, and I feel like allowing exceptions can turn into a lot of exceptions over time
11:24 < bblack> really we should revisit the swap question in general, but I think that goes well out of the partman-cleanup scope
11:25 < bblack> (but affects partman, too)
11:30 < bblack> for future sorting out of swap questions in general: we could also make the argument to just never configure swap *partitions*, and have some 
                base/standard puppetization create/manage swap *files* on the rootfs instead, which makes tuning and runtime changes simpler, etc.  I doubt 
                there's a perf diff we'd care about in that particular case.
Kormat added a subscriber: Kormat.Oct 21 2020, 12:11 PM