Today I've reimaged restbase1016 in T212418: Memory error on restbase1016 though there's some things we should improve post-reimage:
- /dev/sd*4 filesystems are formatted but not present in /etc/fstab
- cassandra instances try to start at the first puppet run
- the default cassandra process stays running after the first puppet run, even though we're explicitly marking cassandra service as stopped
For the first item, ideally partman takes care of that, failing partman we can use puppet.
For the second item, on a newly imaged host puppet will start all cassandra instances which will try to bootstrap (and eventually all but one will fail) though I think we should avoid that and selectively enable what instance(s) to start post-provisioning. We used to mask cassandra instances, though that's no longer working as intended (T211027). I think the next best thing would be use a flag file and add [ConditionPathExists]( https://www.freedesktop.org/software/systemd/man/systemd.unit.html#ConditionArchitecture=) to the cassandra systemd units: by default the file isn't there and operators enabling / bootstrapping cassandra will touch the file to enable the unit.