TL;DR - Given we're overprovisioned (more than we normally account for) in the esams upload cluster (due to being conservative on unknowns during TLS hardware purchases), we can shift good hardware down the line and kick out some of our oldest and worst machines in the process. This simplifies the current cluster hardware layouts, leaves us with fewer total unique hardware configs to deal with, and makes further budgeting/reasoning simpler.
Hardware notes:
cp3003-14 - 12x Older spec, earliest warranty batch, larger SSDs
cp30015-18 - 4x as above, but smaller SSDs
cp3019-22 - 4x as above, but no SSDs and *much* smaller RAM
cp3030-3049 - 20x newest-spec (purchased in TLS rollout era)
Current layout:
| Cluster | Machines |
|----------|-------------|
| Text | 4x newest (cp30[34][01]) + 12x older (cp3003-14) |
| Upload | 16x newest (cp30[34][2-9]) (over-provisioned during TLS rollout) |
| Mobile | 4x older (cp3015-18, smaller disks than 3-14) |
| Misc | 4x older non-SSD (cp3019-22) |
Proposed:
| Cluster | Machines | Notes |
|----------|-------------|----|
| Text | 8x newest (cp30[34][0123]) | Loses 12x older w/ bigger SSD, gains 4x newest from upload |
| Upload | 12x newest (cp30[34][4-9]) | Loses 4x newest (not needed) |
| Mobile | 4x older (cp3003-6) | Loses 4x older w/ smaller SSD, gains 4x older w/ bigger SSD from text |
| Misc | 4x older (cp3007-10) | Loses 4x older w/o SSD, gains 4x older w/ bigger SSD from text |
| Reclaim/Spare/Decom | cp3011-22 | The 12 worst machines can be decom/spare - 4x bigger SSD, 4x smaller SSD, 4x no-SSD) |
In this new state, we would only have 8x of the older-warranty machines left, they need 1:1 replacements with new-spec hardware when we decide to replace them, they're all identical on RAM/SSD (so we're down to 2x active hw configs total in esams), and they're all in the Mobile and Misc clusters (whereas Text+Upload has all the newer hardware with more warranty left).
Steps to get from Here to There:
[ ] 1. Move 30[34][23] from cache_upload to cache_text
[ ] 2. Remove 3011-14 from cache_text (decom/reclaim/spare)
[ ] 3. Move 3003-6 from cache_text to cache_mobile
[ ] 4. Remove 3015-18 from cache_mobile (decom/reclaim/spare)
[ ] 5. Move 3007-10 from cache_text to cache_misc
[ ] 6. Remove 3019-22 from cache_misc (decom/reclaim/spare)