This task is to track work to create a cloudlb project proof-of-concept in codfw.
We agreed on re-using the old cloudgw2001-dev server for early testing before we get proper hardware for it.
This task is to track work to create a cloudlb project proof-of-concept in codfw.
We agreed on re-using the old cloudgw2001-dev server for early testing before we get proper hardware for it.
+1 on the VIP ranges, we can reserve them at least. I think the public IPs are what we're interested the most in for now, so we can focus on this.
Whatever public range we use for the VIPs it makes sense to add a static to the cloudvirts, and other cloud servers, for that same public range going to the cloud-private subnet. This will mean traffic goes direct from cloud hosts to cloudlb over the cloud-private Vlan, rather than following the default out in the production realm through the CRs, and then back to the switches via the cloud vrf. i.e.
ip route 185.15.57.24/29 via 172.20.5.1
I'm slightly on the fence here and I'm wondering if we shouldn't/couldn't do that routing on the cloudsw instead (between the vrfs).
Multiple routes on servers eventually end up with asymmetric routing one day or the other (eg. if the traffic originates from a different IP than the interface one on the servers).
I feel route-leaking on the cloudsw would be a total violation of the whole concept of having two realms, and a dangerous and complicated config to start adding.
Multiple routes on servers eventually end up with asymmetric routing one day or the other (eg. if the traffic originates from a different IP than the interface one on the servers).
I don't think this is much of a concern here. Ultimately with hosts connected to multiple networks, as we have, it'll always be something to consider (statics or not), but the additional routes seem simple and straightforward to me. It's been part of the design for this since day one (see https://w.wiki/6WPR).
There really is no other way for the separate cloud subnets per rack (a complexity for wmcs we are insisting on), and also keeping the inter-rack cloud-private traffic within the cloud realm. Of course there are things like network namespaces or vrf's on the hosts, but I think that is a serious additional layer of complexity, and I suspect more likely to result in strange routing situations.
I can't see how in normal circumstances we'd ever get asymmetric routing here, unless someone got very creative on the hosts.
If it does happen things shouldn't work. The 172.20.0.0/16 networks will not be reachable from the prod realm. We should add to the 'labs-in' filter on the CRs to block traffic from the cloud 10.x prod realm IPs to the public VIP ranges just in case. Just as protection.
@ayounsi just looking at the bird anycast template in puppet, I think the "vips_filter" is potentially not going to allow /32s from 185.15.57.24/29 or 172.20.254.0/24 to be announced?
Sorry not sure how I missed this comment before. We'll need to use pre-pending here as there is EBGP between cloudsw-c8/d5 and cloudsw-e4/f4. MED is non-transitive so it won't work. We'll need 2 pre-pends so c8/d5 will see a route from a primary connected to e4/f4 as better (1 as-hop away vs 2).
But first, do we need to distinguish between a primary and secondary router?
@aborrero correct me if I'm wrong but primary/secondary is a requirement here is it not? I can imagine a scenario:
Unless there is a way to synchronize the HAproxy states between the two cloudlbs? Which would allow active/active as no matter which cloudlb traffic arrives to it'll always to the same backend?
Given these differences I'm wondering if we aren't better defining a different $config_template for the cloudlb's separate to the one we use for the anycast hosts? I can work on that if needed.
Yes, that scenario breakdown seems correct to me. Apparently HAproxy supports state synchronization, even though I don't think we have ever used it. Per the linked docs, it feels simple to configure, but I know from other stuff (netfilter and conntrackd) that it can introduce complexity later when debugging failures, etc.
So do you agree with my read of the multi-master situation?
Change 904518 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] profile::bird::anycast: add template parameter
Change 904745 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/puppet@production] Bird: POC use a different ASN for Cloud hosts
I sent https://gerrit.wikimedia.org/r/904745 to implement a different ASN. But thinking more about it I'm not sure this is needed.
We won't do anything with this information, the ASN won't propagate outside of the WMCS realm, and it adds special cases. Happy to discuss.
@ayounsi just looking at the bird anycast template in puppet, I think the "vips_filter" is potentially not going to allow /32s from 185.15.57.24/29 or 172.20.254.0/24 to be announced?
Indeed! We can get rid of that safeguard before it grows out of hands. We already filter on the network side and the prefixes we want to advertise are explicitly defined in Puppet so AFAIK no risk of rogue prefix.
On the MED/prepending/etc routing, etc I was reading this task as being focused on the codfw POC, where all the servers are on the same switch, so out of scope here but indeed to take into consideration when we export it to eqiad.
Change 904754 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/puppet@production] Bird: remove anycast subnet filter
Yeah that's ok, as you say won't appear on the CRs. If we have reason in future to use a unique one we can.
@ayounsi just looking at the bird anycast template in puppet, I think the "vips_filter" is potentially not going to allow /32s from 185.15.57.24/29 or 172.20.254.0/24 to be announced?
Indeed! We can get rid of that safeguard before it grows out of hands. We already filter on the network side and the prefixes we want to advertise are explicitly defined in Puppet so AFAIK no risk of rogue prefix.
Sounds good, and yeah the filter already explicitly matches the /32s, so it's just an additional safeguard in case someone adds a /32 from the wrong range. Fairly safe to remove I think.
On the MED/prepending/etc routing, etc I was reading this task as being focused on the codfw POC, where all the servers are on the same switch, so out of scope here but indeed to take into consideration when we export it to eqiad.
Yeah fair enough. We could poentially add a var for 'prepend', in addition to the 'deterministic' one in your patch. To be discussed again. Could also make cloudsw1-c8-eqiad and cloudsw1-d5-eqiad route reflectors, and change the EBGP to cloudsw1-e4-eqiad and cloudsw1-f4-eqiad to IBGP clients of those. In which case MED would work.
Typically I've always preferred pre-pends, as they are very obvious when looking at routes and work regardless of EBGP/IBGP. But having two separate ways to express preference is maybe not ideal, and I know we use MED on PyBal etc. already.
For now let's proceed with the existing anycast config, to validate the concept and function of the load-balancer side. We can tweak the setup to prep for the additional challenges having multiple cloudsw in eqiad bring when we are happy with the basics.
Change 904745 abandoned by Ayounsi:
[operations/puppet@production] Bird: POC use a different ASN for Cloud hosts
Reason:
Change 868731 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudlb: introduce BGP setup by means of bird
Change 904754 merged by Ayounsi:
[operations/puppet@production] Bird: remove anycast subnet filter
Change 903622 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloud_private_subnet: add route to public IPv4 range
Change 903623 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloud_private_subnet: codfw: relocate some hiera
@aborrero great work with on the Bird anycast. I can see the conf is there and added a basic bgp peering on the clousw1-b1-codfw side to peer with cloudlb2001-dev.
Unfortunately the session has not established. The reason for this is that the bird template has tried to create 2 BGP sessions, one to each of the core routers in codfw, rather than a single session to the cloudsw itself on 172.20.5.1:
root@cloudlb2001-dev:/etc/bird# grep neighbor bird.conf neighbor 208.80.153.192 external; neighbor 208.80.153.193 external;
This comes directly from /hieradata/codfw/profile/bird.yaml. Ultimately we need a way to specify a list of neighbors in a different way to accomodate this scenario. I need to dig a little deeper, it seems in drmrs the doh600x VMs know to peer directly with the single top-of-rack switch, but there is no bird.yaml in hieradata for drmrs.
Another thing that is not being set is the IP to announce. It's defaulting to the 203.0.113.1/32 dummy IP, rather than say 185.15.57.24/32.
Change 916464 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] clod_private_subnet: fix BGP neighbors
Change 916464 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] clod_private_subnet: fix BGP neighbors
I couldn't find any where in puppet or the config files to set this up, beyond the VIP address on loopback with scope global lo:anycast which puppet already does.
Manually running the check command from /etc/anycast-healthchecker.d/hc-vip-openstack.codfw1dev.wikimediacloud.org.conf
Returns:
connect to address 185.15.57.24 and port 443: Connection refused
HTTP CRITICAL - Unable to open TCP socket
Change 916519 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudlb: update BGP anycast-healthcheck
Change 916519 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudlb: update BGP anycast-healthcheck
took me a while to discover this was the problem. Documented it here for posterity: https://wikitech.wikimedia.org/wiki/Anycast#VIP_not_being_announced_by_BGP
TODO:
Change 917302 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudlb: introduce haproxy check for the BGP VIP
Change 917302 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudlb: introduce haproxy check for the BGP VIP
Change 917329 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] haproxy: check_haproxy: introduce new check mode --check=someup
Change 917369 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):
[operations/homer/public@master] Add policy for cloudsw BGP peering to cloudlb and other cloud servers
Change 917369 merged by jenkins-bot:
[operations/homer/public@master] Add policy for cloudsw BGP peering to cloudlb and other cloud servers
Change 917329 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] haproxy: check_haproxy: introduce new check mode --check=someup
Change 918419 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudlb: disable HAproxy config for IPv6
Change 918419 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudlb: use dnsquery::lookup()
Change 918517 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudlb: haproxy: drop support for IPv6
Change 918517 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudlb: haproxy: drop support for IPv6
Change 918523 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudlb: haproxy: http-service.cfg.erb: fix template
Change 918523 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudlb: haproxy: http-service.cfg.erb: fix template
Change 919292 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] network: introduce cloud-private-b1-codfw subnet
Change 919292 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] network: introduce cloud-private-b1-codfw subnet
Change 919298 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] network: data: add cloud codfw1dev 185.15.57.24/29
Change 919298 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] network: data: add cloud codfw1dev 185.15.57.24/29
Change 919342 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Andrew Bogott):
[operations/puppet@production] Openstack galera/mariadb grants: allow access via haproxy nodes
Change 919342 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] Openstack galera/mariadb grants: allow access via haproxy nodes
Change 919342 merged by Andrew Bogott:
[operations/puppet@production] Openstack galera/mariadb grants: allow access via haproxy nodes
Change 919352 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudservices: codfw1dev: enable cloud-private subnet
Change 920291 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):
[operations/puppet@production] Add a new aggregate network for the cloud-private 'supernet'
Change 920291 abandoned by Cathal Mooney:
[operations/puppet@production] Add a new aggregate network for the cloud-private 'supernet'
Reason:
we're gonna deal with this another way and review when cloudlb poc is done
Change 919352 merged by Andrew Bogott:
[operations/puppet@production] cloudservices: codfw1dev: enable cloud-private subnet
Change 923551 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] cloud_private_subnet: split BGP code into separate profile
Change 923552 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] cloud_private_subnet::bgp: set up route lookup rule only for /32 VIPs
Change 923551 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloud_private_subnet: split BGP code into separate profile
Change 923552 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloud_private_subnet::bgp: set up route lookup rule only for /32 VIPs
Change 924526 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/dns@master] wikimediacloud.org: adjust openstack.codfw1dev FQDN
Change 924526 merged by Arturo Borrero Gonzalez:
[operations/dns@master] wikimediacloud.org: adjust openstack.codfw1dev FQDN
Change 904518 abandoned by Arturo Borrero Gonzalez:
[operations/puppet@production] profile::bird::anycast: add template parameter
Reason:
not required at the moment
Mentioned in SAL (#wikimedia-cloud) [2023-06-12T11:57:29Z] <arturo> [codfw1dev] refresh various occurrences of old FQDNs in instance puppet via horizon (T324992)
Change 929666 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):
[operations/puppet@production] Disable multihop BGP for cloud hosts connected directly to cloudsw
Change 929666 merged by Cathal Mooney:
[operations/puppet@production] Disable multihop BGP for cloud hosts connected directly to cloudsw
Change 936235 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudlb: codfw: use someup check for haproxy BGP check
Change 936235 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudlb: codfw: use someup check for haproxy BGP check
Change 940321 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] acme_chief: openstack-codf1dev: drop cloudcontrol access
Change 940321 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] acme_chief: openstack-codf1dev: drop cloudcontrol access