T382220 suggests that cloudgw1002 might be having hardware issues, and @aborrero suggests that we replace it.
We have several servers already racked from a cancelled dev experiment that we could use to replace this workload without ordering more hardware.
Existing cloudgw servers:
cloudgw1001: C8; 8 cores, 32GB RAM, 2x 10GB ports, 2x240 sssd, purchased march 2021
cloudgw1002: D5; 8 cores, 32GB RAM, 2x 10GB ports, 2x240 sssd, purchased march 2021
Unused cloud-dev servers:
cloudnet1007-dev: E4; 2x12 cores, 64GB RAM, 2x 10GB ports, 4x960 ssd, purchased august 2023
cloudnet1008-dev: F4; 2x12 cores, 64GB RAM, 2x 10GB ports, 4x960 ssd, purchased august 2023
cloudcontrol1008-dev: D5; 2x12 cores, 64GB RAM, 2x 10GB ports, 4x960 ssd, purchased august 2023
cloudcontrol1009-dev: E4; 2x12 cores, 64GB RAM, 2x 10GB ports, 4x960 ssd, purchased august 2023
cloudcontrol1010-dev: F4; 2x12 cores, 64GB RAM, 2x 10GB ports, 4x960 ssd, purchased august 2023
I propose that we replace both cloudgw100[12] servers (not at the same time, of course) with renamed cloudnet100[78] boxes.
There are a couple of caveats:
- Does it matter that the replacement servers are in different racks? Pinging @aborrero and @cmooney for an answer
- Is renaming servers in place in a datacenter so awful that we should never ever do it? Pinging @RobH for an answer