Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | None | T209460 CloudVPS: network architecture | |||
Resolved | aborrero | T270704 cloud: introduce new edge network architecture for eqiad1 and codfw1dev | |||
Resolved | aborrero | T272963 cloudgw: develop HA setup |
Event Timeline
Change 663799 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudgw2002-dev: give it proper puppet role
Change 663799 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudgw2002-dev: give it proper puppet role
Script wmf-auto-reimage was launched by aborrero on cumin2001.codfw.wmnet for hosts:
cloudgw2002-dev.codfw.wmnet
The log can be found in /var/log/wmf-auto-reimage/202102121057_aborrero_28819_cloudgw2002-dev_codfw_wmnet.log.
Completed auto-reimage of hosts:
['cloudgw2002-dev.codfw.wmnet']
and were ALL successful.
Change 663801 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] keepalived: add support for custom template
Change 663801 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] keepalived: add support for custom template
Change 663823 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudgw: introduce HA by using keepalived/VRRP
Change 664241 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudgw: move common hiera into proper file
Change 664241 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudgw: move common hiera into proper file
Change 663823 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudgw: introduce HA by using keepalived/VRRP
Mentioned in SAL (#wikimedia-cloud) [2021-02-15T15:45:29Z] <arturo> [codfw1dev] connect virtual router cloudinstances2b-gw to vlan cloud-gw-transport-codfw (185.15.57.10) (T272963)
Mentioned in SAL (#wikimedia-cloud) [2021-02-15T15:45:54Z] <arturo> [codfw1dev] drop subnet definition for cloud-instances-transport1-b-codfw (T272963)
Change 664255 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] Revert "cloud: hiera: add vlan 2120 back into the neutron bridge"
Change 664255 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] Revert "cloud: hiera: add vlan 2120 back into the neutron bridge"
Change 664256 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] Revert "cloud: hiera: connect cloudnet servers back to vlan 2120"
Change 664257 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] Revert "cloud: hiera: enable back neutron hacks in codfw1dev"
Change 664256 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] Revert "cloud: hiera: connect cloudnet servers back to vlan 2120"
Change 664257 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] Revert "cloud: hiera: enable back neutron hacks in codfw1dev"
Change 664257 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] Revert "cloud: hiera: enable back neutron hacks in codfw1dev"
Change 664307 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudgw: stop setting up VIP addresses that are now handle via keepalived/VRRP
Change 664307 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudgw: stop setting up VIP addresses that are now handle via keepalived/VRRP
Mentioned in SAL (#wikimedia-cloud) [2021-02-15T16:25:24Z] <arturo> [codfw1dev] rebooting all cloudgw200x-dev / cloudnet200x-dev servers (T272963)
Change 664311 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudgw: switch data place interface config modes to manual
Change 664311 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudgw: switch data place interface config modes to manual
Change 664317 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudgw: interfaces: relax check on routing setup by using 'onlink'
Change 664317 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudgw: interfaces: relax check on routing setup by using 'onlink'
Change 664521 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] conntrackd: also install the conntrack tool
Change 664521 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] conntrackd: also install the conntrack tool
Change 664538 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] openstack: cloudgw: allow incoming conntrackd TCP connection
Change 664538 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] openstack: cloudgw: allow incoming conntrackd TCP connection
Change 664549 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudgw: use address per interface in the cloud-instance-transport subnet
Change 664549 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudgw: use address per interface in the cloud-instance-transport subnet
Change 664603 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudgw: let keepalived track static routes
Change 664603 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudgw: let keepalived track additional static routes
Change 664785 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudgw: set up conntrack sysctl parameters
Change 664785 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudgw: set up conntrack sysctl parameters
Change 664789 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudgw: keepalived: use nopreempt option
Change 664789 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudgw: keepalived: use nopreempt option
Change 664800 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudgw: start conntrackd before keepalived
Change 664800 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudgw: refresh conntrackd service dependencies
This is in very good shape. I tested several failover scenarios:
- manually stop keepalived in the primary VRRP node
- reboot of the primary VRRP node
- flapping (backup -> primary -> backup -> primary)
How I tested this:
- ssh tools-codfw1dev-k8s-worker-1.tools-codfw1dev.codfw1dev.wikimedia.cloud
- aborrero@tools-codfw1dev-k8s-worker-1:~$ wget https://network-tests.toolforge.org/files/1GB.bin -O /dev/null
- ssh cloudgw2001-dev.codfw.wmnet --> reboot if primary
- ssh cloudgw2002-dev.codfw.wmnet --> if new primary, watch traffic flowing
- watch wget download still flowing despite several failovers