Similar to T148506.
This is about row B only:
[x] Rack and cable the switches according to diagram (blocked on T187118) [Chris] {F11996449}
[x] Connect mgmt/serial [Chris]
[x] Check via serial that switches work, ports are configured as down [Arzhel]
[x] Stack the switch, upgrade JunOS, initial switch configuration [Arzhel]
[x] Add to DNS [Arzhel]
[x] Add to LibreNMS & Rancid [Arzhel]
[x] Switch ports configuration to match asw-b (+login announcement) [Arzhel]
[x] Solve snowflakes [Chris/Arzhel]
```
WAS xe-3/1/0 description "labnet1001 eth5" MOVED TO: xe-2/0/22
WAS xe-3/1/2 description "labnet1001 eth4" MOVED TO: xe-2/0/24
WAS ge-3/0/33 description "labnet1001 eth0" MOVED TO: ge-2/0/23
WAS xe-4/1/0 description "labnet1002 eth3" MOVED TO: xe-4/0/45
WAS xe-4/1/2 description "labnet1002 eth4" MOVED TO: xe-4/0/44
```
[x] Pre populate FPC2, FPC4 and FPC7 (QFX) with copper SFPs matching the current production servers on rack 2, 4 and 7 [Chris]
[x] Add to Icinga [Arzhel]
**Thursday 22nd, noon Eastern (4pm UTC) 3h (for all 3 rows)**
[x] Verify cr2-eqiad is VRRP master
[x] Disable interfaces from cr1-eqiad to asw-b
[x] Move cr1 router uplinks from asw-b to asw2-b (and document cable IDs if different) [Chris/Arzhel]
```
xe-2/0/44 -> cr1-eqiad:xe-3/0/1
xe-2/0/45 -> cr1-eqiad:xe-4/0/1
xe-7/0/44 -> cr1-eqiad:xe-4/1/1
xe-7/0/45 -> cr1-eqiad:xe-3/1/1
```
[x] Connect asw2-b with asw-b with 2x10G (and document cable IDs if different) [Chris]
```
xe-2/0/43 -> asw-b-eqiad:xe-2/1/0
xe-7/0/43 -> asw-b-eqiad:xe-7/1/0
```
[x] Verify traffic is properly flowing though asw2-b
[x] Update interfaces descriptions on cr1
**Before maintenance**
[] Failover hosts TBD?
**In maintenance window April 10th (3pm UTC, 11am EDT, 8am PDT), 4h.**
[] Downtime switch/hosts in Icinga
[] Failover VRRP master to cr1
[] Verify traffic is properly flowing through cr1/asw2
[] Disable interface between cr2 and asw-b-eqiad:ae2
[] Move servers from asw-b to asw2-b [Chris]
Servers I wasn't able to identify a owner:
```
iron <- experimental bastion, no special needs
```
@elukey
```
aqs1008
```
@ArielGlenn
```
dumpsdata1001
```
@Dzahn
```
phab1001
```
@ssastry
(ruthenium is the Parosid test server so should be OK)
```
ruthenium
```
@fgiunchedi
These need to be depooled from LVS one at a time and then re-pooled
```
thumbor1001
thumbor1002
```
@Eevans / @mobrovac
Cassandra instances there need to be drained before switching the server off, cf. https://wikitech.wikimedia.org/wiki/Cassandra
```
restbase-dev1005
```
Decommissioned
```
promethium
bast1001
californium
silver
```
@elukey
```
druid1005
analytics1046
analytics1047
analytics1048
analytics1049
analytics1050
analytics1051
analytics1061
analytics1062
analytics1063
analytics1072
analytics1073
kafka1002
kafka-jumbo1003
notebook1003
mc1024
mc1025
mc1026
mc1027
```
DBs @jcrespo @Marostegui already aware
```
db1051 -> scheduled for decommissioning T195484 host down, can be ignored probably
db1052
db1072 -> misc master, special care needed see T183585#4427995
db1073 -> misc master special care needed see T183585#4427995
db1076
db1077
db1083
db1084
db1085
db1086
db1098
db1099
db1104
db1112
db1113
dbproxy1004 -> passive
dbproxy1005 -> passive
dbproxy1006 -> passive
es1013
es1014
```
@Gehel
elastic* should not be an issue T187962#4238825
```
elastic1028
elastic1036
elastic1037
elastic1038
elastic1039
elastic1046
elastic1047
elastic1049
elastic1050
logstash1005
maps1002
wdqs1007
```
@akosiaris
relevant "poolcounter1001 Turns out we did not really need it after all. The sites survived the downtime" T187962#4241998
```
poolcounter1002
kubernetes1002
kubestage1002
ores1003
ores1004
puppetmaster1001
rhodium
wtp1031
wtp1032
wtp1033
wtp1034
wtp1035
```
Traffic: @Vgutierrez @ema me
lvs: Disable puppet/pybal, make sure healthy before proceeding to next host
rdns?
```
lvs1001:eth1
lvs1002:eth1
lvs1003:eth1
lvs1004
lvs1005
lvs1006
chromium
ripe atlas <- can ignore
```
@fgiunchedi
"ms-be* to be moved one at a time, just a clean poweroff is enough, no depooling needed."
```
ms-be1016
ms-be1017
ms-be1018
ms-be1020
ms-be1022
ms-be1023
ms-be1031
ms-be1032
ms-be1034
```
@fgiunchedi
Needs to be depooled from LVS, clean shut down and then repooled
```
prometheus1004
```
@joe / @elukey
```
mw1284
mw1285
mw1286
mw1287
mw1288
mw1289
mw1290
mw1293
mw1294
mw1295
mw1296
mw1297
mw1298
mw1299
mw1300
mw1301
mw1302
mw1303
mw1304
mw1305
mw1306
mw1313
mw1314
mw1315
mw1316
mw1317
mw1318
```
@Joe (on vacations during window)
relevant:
"conf1002 is in row C (etcd connections will be interrupted, we know it can cause issues)." T187962#4241998
```
conf1005
rdb1004
```
Cloud: @chasemp (on vacations during window) @Andrew
```
labcontrol1004
labnodepool1001
labnodepool1002
labpuppetmaster1001
labvirt1001 eth0
labvirt1001 eth1
labvirt1002 eth0
labvirt1002 eth1
labvirt1003 eth0
labvirt1003 eth1
labvirt1004 eth0
labvirt1004 eth1
labvirt1005 eth0
labvirt1005 eth1
labvirt1006 eth0
labvirt1006 eth1
labvirt1010 eth0
labvirt1010 eth1
labvirt1011 eth0
labvirt1011 eth1
labvirt1012 eth0
labvirt1012 eth1
labvirt1013 eth0
labvirt1013 eth1
labvirt1014 eth0
labvirt1014 eth1
labvirt1015-eth0
labvirt1015-eth1
labvirt1016-eth0
labvirt1016-eth1
labvirt1017
labvirt1017-eth1
labvirt1018
labvirt1018-eth1
labweb1001
virt1010 eth0
virt1010 eth1
virt1011 eth0
virt1011 eth1
virt1012 eth0
virt1012 eth1
```
[] Move cr2 router uplinks from asw-b to asw2-b (and document cable IDs if different) [Chris/Arzhel]
```
xe-2/0/46 -> cr2-eqiad:xe-3/0/1
xe-2/0/47 -> cr2-eqiad:xe-4/0/1
xe-7/0/46 -> cr2-eqiad:xe-4/1/1
xe-7/0/47 -> cr2-eqiad:xe-3/1/1
```
[] Re-enable cr2 interfaces
[] Move VRRP master back to cr2
[] Verify no more traffic on asw-b<->asw2-b link [Arzhel]
[] Disable asw-b<->asw2-b link [Arzhel]
[] Verify all servers are healthy, monitoring happy
**After maintenance window**
[] Update interfaces descriptions on cr2
[] Cleanup config, monitoring, DNS, etc.
[] Wipe & unrack asw-b