In T196000: move/setup/install labtestnet2003(WMF6469) we got labtestnet2003 setup as the direct replacement for labtestnet2001 which is aging ({T193081}).
This task is for moving functionality from labtestnet2001 to labtestnet2003.
In T196000: move/setup/install labtestnet2003(WMF6469) we got labtestnet2003 setup as the direct replacement for labtestnet2001 which is aging ({T193081}).
This task is for moving functionality from labtestnet2001 to labtestnet2003.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | aborrero | T196752 Replace labtestnet2001 with labtestnet2003 and decomission labtestnet2001 | |||
Resolved | ayounsi | T199779 Update core routers routing for labtest Cloud VPS deployment |
Change 446059 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloud vps: disable labtestnet2001 and replace it with labtestnet2003
Change 446059 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloud vps: disable labtestnet2001 and replace it with labtestnet2003
labtestnet2001 is now empty. Notes: https://etherpad.wikimedia.org/p/labtestnet2001
The active host is now labtestnet2002, while labtestnet2003 is standby.
Change 446069 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloud vps: labtest: missing allowed connection
Change 446069 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloud vps: labtest: missing allowed connection
Change 446255 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] Revert "cloud vps: labtest: missing allowed connection"
Change 446255 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] Revert "cloud vps: labtest: missing allowed connection"
Change 446274 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloud vps: labtestn: allow more connections from labtest
Change 446274 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloud vps: labtestn: allow more connections from labtest
Ok, this is the status of the labtest cluster as far as I know (https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Deployments#Labtest_deployment).
The labtestnet2001 server is currently in spare role, intended for decom soon.
The labtestnet2002 server is currently assigned the net active role.
The labtestnet2003 server, which is new, is assigned the net standby role.
The net nodes have 2 NICs connected to the switches:
It turns out that only labtestnet2001 has this configuration actually deployed (switches ports, wires, etc)
It seems we don't have anyone in the codfw datacenter until 2nd Aug to plug the NICS/switch ports, so we might consider rollback.
OR we could try to hack native vlan + trunk on eth0, which is a hack.
aborrero@labtestnet2002:~ $ sudo ethtool eth1 | grep Link Link detected: no aborrero@labtestnet2002:~ $ sudo ip a | grep -e eth1 -e br2102 3: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 6: br2102: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default inet 10.196.16.1/24 brd 10.196.16.255 scope global br2102 7: eth1.2102@eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master br2102 state LOWERLAYERDOWN group default
aborrero@labtestnet2003:~ $ sudo ethtool eth1 | grep Link Link detected: no aborrero@labtestnet2003:~ $ sudo ip a | grep -e eth1 -e br2102 3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
Not having this cluster properly running could impact our development of eqiad1 (since @Andrew is working on migration scripts).
In any case, I would like to coordinate before doing further steps.
I'm temporarily reverting this change, pending us having someone in codfw do the needed cable work.
Change 446562 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudvps: labtest: only have one active net node at a time
Change 446562 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudvps: labtest: only have one active net node at a time
Change 450959 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloud vps: disable labtestnet2001 and replace it with labtestnet2003
Change 450959 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloud vps: disable labtestnet2001 and replace it with labtestnet2003
I did again this operation today, notes: https://etherpad.wikimedia.org/p/labtestnet2001-again
@ayounsi could you please change the routing again like in T199779 ?
To be clear, you mean this specific change T199779#4430882 ?
Change the static route 10.196.16.0/21 from labtestnet2001 to labtestnet2002?
@ayounsi -- yes, I think that's what he means, except we're switching to 2003, not 2002.
Sync'ed up over IRC, change pushed to cr1/2-codfw:
[edit routing-options static route 10.196.16.0/21] - next-hop 10.192.20.5; + next-hop 10.192.20.9;
I've confirmed that
So as far as I'm concerned this switchover is done. We should now open a ticket to decom labtestnet1001.