cloudsw1-c8-eqiad and cloudsw1-d5-eqiad are running JunOS 18.4R2-S4.10.
Opening this task to track upgrading them to JunOS 20+ to bring them into line with the other cloudsw devices (which are on 20.2 and 20.4).
Plan will be to upgrade each switch one by one. The 'cloudsw2' devices in each of these racks are daisy-chained from the respective cloudsw1 device in the same rack. So when we upgrade each all hosts in that rack will be offline for the duration of the work. Connectivity to hosts in other racks should remain up throughout.
In total the upgrade of each device should be in the region of 20-30 minutes during which all hosts in the rack will suffer a complete network outage. So we should do it under a maintenance window, and depool, prep or otherwise do what is required to minimize the impact. We should make sure the active cloudnet and cloudgw hosts are manually switched in advance also.
The hosts that will be affected are as follows:
Rack C8 (also including hosts in row B which connect via this switch):
cloudbackup1003 cloudcephmon1001 cloudcephmon1003 cloudcephosd1006 cloudcephosd1007 cloudcephosd1008 cloudcephosd1009 cloudcephosd1016 cloudcephosd1017 cloudcephosd1018 cloudcephosd1021 cloudcephosd1022 cloudgw1001 cloudnet1005 cloudlb1001 cloudvirt1025 cloudvirt1026 cloudvirt1027 cloudvirt1031 cloudvirt1032 cloudvirt1033 cloudvirt1034 cloudvirt1035 cloudvirt-wdqs1001 cloudvirt-wdqs1002 cloudvirt-wdqs1003
Rack D5:
cloudbackup1004 cloudcephmon1002 - no action needed (HA) cloudcephosd1011 - to drain - ready cloudcephosd1012 - to drain - ready cloudcephosd1013 - to drain - ready cloudcephosd1014 - to drain - ready cloudcephosd1015 - to drain - ready cloudcephosd1019 - to drain cloudcephosd1020 - to drain cloudcephosd1023 - to drain cloudcephosd1024 - to drain cloudgw1002 - no action needed (HA) cloudnet1006 - no action needed (HA) cloudlb1002 cloudvirt1028 cloudvirt1029 cloudvirt1030 cloudvirt1036 cloudvirt1037 cloudvirt1038 cloudvirt1039 cloudvirt1040 cloudvirt1041 cloudvirt1042 cloudvirt1043 cloudvirt1044 cloudvirt1045 cloudvirt1046 cloudvirt1047 cloudvirtlocal1001
cloudvirts
We need to move the VMs running on the cloudvirts to other hypervisors, but we can't move all of them, so we should move only the ones that are sensitive, the rest should be able to come back once the network is restored.
List of VMs to move to a different rack:
TBD