Change Details

In the past, the majority of our partman recipes didn't call for a swap space. My (@robh) understanding is, since we order our systems with sufficient memory for their allocated roles, we shouldn't need to fall back to the swap space anyhow. Over time, with more contributors to our partman recipes, we now have a mix of swap and non swap system use. There doesn't seem to be a lot of reasoning in most cases, simply that the overall recipe of raided filesystems worked. (I could be wrong on this, hence this task.) Is there a reason any of our systems should install/partition with swap usage? Shouldn't we standardize where we can on this, so we don't have such a large mix of partman recipes? In addition to the swap space question, there is also a large divergence on the use of LVM. My prior understanding, from previous/past discussions within the ops team, is we should put all our systems partitions in a large LVM, and then attempt to only use 80% of the disk's capacity. The remaining 20% was to be included in the LVM, but left free so that space could be emergency allocated in the event of a partition reaching storage capacity. The other reason (that I recall) was snapshots, but those are not taken across the cluster (nor do they need to be since we have other backup solutions in place for most misc systems, LVM snapshots are used when needed on specific service clusters.) Should the standard still be to use an LVM and leave 20% unallocated for emergency growth? I realize that we may have some specific clusters that do not follow the above proposed standards, but most likely should attempt to apply some similar standards. Thoughts? == Filippo's proposition Address the common case of "stateless" hosts or otherwise misc hosts where a bunch of disks (>1) are presented as-is to the OS and we do Linux raid on top. In [I36b50e054](https://gerrit.wikimedia.org/r/c/operations/puppet/+/553363) this is achieved with GPT partitioning, a single VG with LVs created for `/` `/srv` and `swap` with "comfortable" sizes. The idea being that we can tweak LVs as needed post-provisioning via Puppet (or ad-hoc, in emergencies), the most common case I'd imagine being online extending the filesytems. The configuration is split between partitioning/common options and the raid/block device configuration, so they can be combined as required.

In the past, the majority of our partman recipes didn't call for a swap space. My (@robh) understanding is, since we order our systems with sufficient memory for their allocated roles, we shouldn't need to fall back to the swap space anyhow. Over time, with more contributors to our partman recipes, we now have a mix of swap and non swap system use. There doesn't seem to be a lot of reasoning in most cases, simply that the overall recipe of raided filesystems worked. (I could be wrong on this, hence this task.) Is there a reason any of our systems should install/partition with swap usage? Shouldn't we standardize where we can on this, so we don't have such a large mix of partman recipes? In addition to the swap space question, there is also a large divergence on the use of LVM. My prior understanding, from previous/past discussions within the ops team, is we should put all our systems partitions in a large LVM, and then attempt to only use 80% of the disk's capacity. The remaining 20% was to be included in the LVM, but left free so that space could be emergency allocated in the event of a partition reaching storage capacity. The other reason (that I recall) was snapshots, but those are not taken across the cluster (nor do they need to be since we have other backup solutions in place for most misc systems, LVM snapshots are used when needed on specific service clusters.) Should the standard still be to use an LVM and leave 20% unallocated for emergency growth? I realize that we may have some specific clusters that do not follow the above proposed standards, but most likely should attempt to apply some similar standards. Thoughts? == Filippo's proposition Address the common case of "stateless" hosts or otherwise misc hosts where a bunch of disks (>1) are presented as-is to the OS and we do Linux raid on top. In [I36b50e054](https://gerrit.wikimedia.org/r/c/operations/puppet/+/553363) this is achieved with GPT partitioning, a single VG with LVs created for `/` `/srv` and `swap` with "comfortable" sizes. The idea being that we can tweak LVs as needed post-provisioning via Puppet like in [Ia50bb1591](https://gerrit.wikimedia.org/r/c/operations/puppet/+/554044) (or ad-hoc / manually, in emergencies), the most common case I'd imagine being online extending the filesytems. The configuration is split between partitioning/common options and the raid/block device configuration, so they can be combined as required.