Page MenuHomePhabricator

Remove static routes for ns[01] and replace their announcements with bird
Closed, ResolvedPublic

Description

The parent task for this is T347054.

As discussed above, the idea is to replace the current static routes for ns[01] in the core routers and announce the respective IPs via bird instead, using the existing and already established BGP session on the DNS hosts. In doing so, we will help alleviate problems with static routes such as their manual updates, ensuring they are updated between host IP changes (commission and decommission), and absence of peer review for any changes; this will pave the way for automatic management of BGP annoucements and easier depooling during maintenance of the DNS hosts. ns2 is anycasted and is not a part of this discussion and no change is required there.

The current static routes (for ns0 on cr1-eqiad as an example) looks like:

/* ns0 */                               
route 208.80.154.238/32 {
    next-hop [ 208.80.154.6 208.80.154.153 208.80.154.77 ];
    readvertise;
    no-resolve;
}

The hosts above are dns100[4-6].

Automation

To automate this, we need to announce these IPs via birdinstead. Currently, bird.conf includes:

include "/etc/bird/anycast-prefixes.conf";

This is because we have so far only used bird for announcing the anycast addresses but not for the unicast ones, such as 208.80.154.238/32 (ns0), so all our Puppetization and tooling is built around that expectation.

root@dns1004:~# cat /etc/bird/anycast-prefixes.conf 
# Generated 2023-10-02 15:31:42.704581 by anycast-healthchecker (pid=1815037)
# 203.0.113.1/32 is a dummy IP Prefix. It should NOT be used and REMOVED from the constant.
define ACAST_PS_ADVERTISE =
    [
        203.0.113.1/32,
        198.35.27.27/32,
        10.3.0.1/32,
        10.3.0.2/32
    ];

Looking at bird.conf again:

function match_route()
{
    return net ~ ACAST_PS_ADVERTISE;
}
filter vips_filter {
    if ( net.len = 32 && net !~ 203.0.113.1/32 && match_route() ) then {
        accept;
    }
    else {
        reject;
    }
}

In theory, if we can add the ns[01] IPs to ACAST_PS_ADVERTISE above (automatically, via profile::bird::advertise_vips and the associated changes for the healthchecks), that should be enough? Note that we have to customize the Puppetization so that we only add the ns0 IP in the eqiad DNS hosts and ns1 IP in the codfw DNS hosts, as we can't and should not announce them from everywhere. Otherwise, it should be fairly straightforward: we add the VIP the same way we do for the anycast IPs, making sure to customize it to specific sites. That should be fine or am I missing something there?

There is one more change required, in hieradata/common.yaml:

authdns_addrs:
  ns0-v4:
    address: '208.80.154.238'
  ns1-v4:
    address: '208.80.153.231'
  ns2-v4:
    address: '198.35.27.27'
    skip_loopback: true # bird::anycast takes care of this one

We will need to set skip_loopback to ns0-v4 and ns1-v4 as bird will create the loopback IPs. (Or since already created, we can skip those? I will confirm when we work on it.)

I am not sure if we anticipate any additional bird unicast announcements. If not, then the above will work and we can simply customize ACAST_PS_ADVERTISE based on eqiad/codfw. The cleaner approach probably would be to change all tooling to accommodate and make a distinction between the advertisement of unicast and anycast addresses but I am going to leave that for discussion.

Adding @ayounsi and @cmooney for comments and discussion; thanks!

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

We can and probably should have a backup static routes for each of ns[01] but it can be to a single host instead of all three.

Indeed, looks about right :)

For Puppet, if we can change the Hiera merge strategy to hash for profile::bird::advertise_vips between hieradata/role/common/dnsbox.yaml and hieradata/role/SITE/dnsbox.yaml (cc @jbond ) it shoudn't require any other change.
If that's the good way to go I think it only requires:

hieradata/common.yaml:lookup_options
profile::bird::advertise_vips:
    merge: hash

On the network side, we filter on "anycast_prefixes" so we will need to at least add the specific /32s in there.

Proposal looks good to me, minor nit would be to rename ACAST_PS_ADVERTISE to remove references to anycast to avoid confusion in the future.

Indeed, looks about right :)

For Puppet, if we can change the Hiera merge strategy to hash for profile::bird::advertise_vips between hieradata/role/common/dnsbox.yaml and hieradata/role/SITE/dnsbox.yaml (cc @jbond ) it shoudn't require any other change.

+1 changing the merge strategy will help with the site specific config

Otherwise, it should be fairly straightforward: we add the VIP the same way we do for the anycast IPs, making sure to customize it to specific sites. That should be fine or am I missing something there?

Yes that's correct. For references cloud services have used the same anycast role / bird erb template to announce unicast routes for certain services recently and it's been working well.

I am not sure if we anticipate any additional bird unicast announcements. If not, then the above will work and we can simply customize ACAST_PS_ADVERTISE based on eqiad/codfw. The cleaner approach probably would be to change all tooling to accommodate and make a distinction between the advertisement of unicast and anycast addresses but I am going to leave that for discussion.

Proposal looks good to me, minor nit would be to rename ACAST_PS_ADVERTISE to remove references to anycast to avoid confusion in the future.

That would probably make most sense. Not 100% what the best thing to do is. In theory, from a single-hosts perspective, it's announcing all IPs in the same way. It's only if two of them happen to announce the same that is suddenly becomes "anycast". So perhaps from a cosmetic point of view we should just refactor Puppet to not use the 'anycast' term at all? May be valid reasons to do something else of course.

ACAST_PS_ADVERTISE is hardcoded in anycast_healthchecker (the tool we use to monitor services).
For the wider picture it would make sens to rename the Bird related Puppet elements to not mention Anycast when Anycast doesn't make sens, but that ship might have sailed (the cost of changing it might be too high now). Could be worth having a look though. Same on the network side.

the cost of changing it might be too high now

Yeah it'd be a "nice to have" if the reference was removed. But I don't think a disaster if it stays that way. FWIW the same cloud hosts announcing unicast also announce some anycast routes, so the name still sort-of fits. And I believe that we can categorise all the IPs the dns hosts will announce as "anycast" (given they are announced from multiple hosts, if not at multiple sites).

Thanks everyone for the discussion and feedback above! So it seems like two main points have come up above:

  1. We can certainly use the merge: hash but I had a different thought in mind, to separate the unicast and anycast addresses with something like:
modules/wmflib/types/advertise_vip.pp
   'check_fail'     => Optional[Integer],
   'address_ipv6'   => Optional[Stdlib::IP::Address::V6::Nosubnet],
   'check_cmd_ipv6' => Optional[String],
+  'unicast_site'   => Optional[Wmflib::Sites],

And in modules/profile/manifests/bird/anycast.pp:

+  Optional[Boolean]                              $is_unicast            = lookup('profile::bird::is_unicast', {'default_value' => false}),

And then something on the lines of: if unicast and site not equal to unicast_site, skip setting up the configuration, otherwise proceed as normally.

The idea here was to make a clear distinction between the anycast and the unicast addresses and allowing for per-site or per-host overrides with profile::bird::is_unicast instead. This also allows for clear visibility into the type of address in the yaml file itself but I realize this might be over-engineering so I am just happy to do the override in the role instead.

  1. My vote is for renaming ACAST_PS_ADVERTISE to something like BIRD_IP_ADVERTISE. Updating this is not a lot of work and I am happy doing the rollout. Essentially, we will need to update this in the following places:
modules/bird/templates/bird_anycast.conf.erb
function match_route()
{
    return net ~ ACAST_PS_ADVERTISE;
}
<% if @do_ipv6 -%>
function match_route6()
{
    return net ~ ACAST6_PS_ADVERTISE;
}
<% end -%>

And then:

modules/bird/templates/anycast-healthchecker.conf.erb
bird_variable          = ACAST_PS_ADVERTISE
bird6_variable         = ACAST6_PS_ADVERTISE

There should be no other changes required, however, given the various systemd bindings and the restart of the anycast-healthchecker service, this will restart a bunch of stuff but that's fine if we want to do it.

Please let me know your thoughts and we can make a decision -- you can leave the rollout and updates to me :) I think in the interest of consistency given that we will be using unicast addresses as well, we should update it everywhere before we proceed.

ACAST_PS_ADVERTISE is hardcoded in anycast_healthchecker (the tool we use to monitor services).

in that case agree its to much effort

ACAST_PS_ADVERTISE is hardcoded in anycast_healthchecker (the tool we use to monitor services).

in that case agree its to much effort

Just for clarity: I think it's not hardcoded and configurable via the configuration file; please see my comment above and also from https://github.com/unixsurfer/anycast_healthchecker:

anycast-prefixes.conf file defines a list of IP prefixes which is stored in a variable named ACAST_PS_ADVERTISE. The name of the variable can be anything meaningful but bird_variable setting must be changed accordingly.

Ah right! My bad.

Unrelated and maybe a scope creep, but we could also start by advertising a unicast v6 IP to validate the whole mechanism, and then move to the v4 IP.

We can and probably should have a backup static routes for each of ns[01] but it can be to a single host instead of all three.

This is obviously a matter of risk-assessment, and confidence in the various systems at work. But personally I would argue in favour of ditching them completely.

We have 3 listed name-server IPs. Once is anycast from multiple hosts across our pops, the others will be anycasted from multiple hosts at our primary sites. The number of elements that need to fail for user DNS querys to go unanswered is therefore quite high, and I'd come down on the side of them not being needed.

At the end of the day we're talking about a few static routes, however. It's not that onerous to keep supporting it if that is the decision, so I'm happy with whatever the consensus is.

We can and probably should have a backup static routes for each of ns[01] but it can be to a single host instead of all three.

Actually this isn't really a possibility. Due to default routing-protocol priority ("administrative distance"), the Juniper's prefer static routes over those learnt from BGP.

If we have the same IP learnt from 3 hosts over BGP, and a static for the same /32 pointing to server1, then the Juniper will ignore the 3 BGP routes and send all traffic to server 1. It is possible to tweak the relative protocol priorities, but I'm fairly sure we don't want to go messing with these well-known and almost universal values.

Oops, I missed some of the comments.

  • I'm in favor of ditching the statics
  • Changing the Hiera merge strategy seems cleaner to me than introducing new variables, but I'll leave the last call to people with more Puppet expertise than me
  • No strong preference about the ACAST_PS_ADVERTISE rename, but the various renaming need to be done in a dedicated task before or after this one to prevent scope creep

We can and probably should have a backup static routes for each of ns[01] but it can be to a single host instead of all three.

This is obviously a matter of risk-assessment, and confidence in the various systems at work. But personally I would argue in favour of ditching them completely.

We have 3 listed name-server IPs. Once is anycast from multiple hosts across our pops, the others will be anycasted from multiple hosts at our primary sites. The number of elements that need to fail for user DNS querys to go unanswered is therefore quite high, and I'd come down on the side of them not being needed.

At the end of the day we're talking about a few static routes, however. It's not that onerous to keep supporting it if that is the decision, so I'm happy with whatever the consensus is.

Yes that's fair, the failure model I have though is how tightly coupled the various services are and as such, if anycast-hc fails for whatever reason, bird fails too and stops advertising the IPs. The probability of that happening is low but note that Puppet controls all these things and a bad change can quickly traverse to all hosts, unless reverted. That I guess is my only concern here.

But it seems like that static routes will take priority over BGP so let's drop this idea.

Oops, I missed some of the comments.

  • I'm in favor of ditching the statics

From @cmooney's comment too, seems like we have to so this is resolved.

  • Changing the Hiera merge strategy seems cleaner to me than introducing new variables, but I'll leave the last call to people with more Puppet expertise than me

No strong opinions here, so we will let @jbond break the tie!

  • No strong preference about the ACAST_PS_ADVERTISE rename, but the various renaming need to be done in a dedicated task before or after this one to prevent scope creep

I can take care of that if there is no objection to it and that's the consensus. My vote is for that for what it's worth!

  • Changing the Hiera merge strategy seems cleaner to me than introducing new variables, but I'll leave the last call to people with more Puppet expertise than me

No strong opinions here, so we will let @jbond break the tie!

I would go with changing the merge strategy first and only instigate other options if that proves undesirable

For posterity:

  • no static routes
  • merge strategy Arzhel mentioned above
  • I am going to rename ACAST_PS_ADVERTISE but wait in case there are objections.

Change 963375 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Add ns0 and ns1 /32 routes to anycast_prefixes list

https://gerrit.wikimedia.org/r/963375

@cmooney's comment above on the default routing policy and priority of routes got me thinking: if we go by the link above, this means that static routes have a higher priority over BGP. If that is indeed the case -- and the documentation suggests that it is unless there is more to it -- would that then mean that all the static routes we have had so far had higher priority than the BGP ones announced from the hosts? But in that case, we had and have stale static routes as well and traffic is still reaching the hosts with the BGP session established, so I am wondering what I am missing here?

That's only for equal prefix length. For example a static /32 will have priority over a BGP /32, but a BGP /32 will have priority over a static /27 (like we have for VIPs).

That's only for equal prefix length. For example a static /32 will have priority over a BGP /32, but a BGP /32 will have priority over a static /27 (like we have for VIPs).

Ah thank you! That makes sense.

Change 964918 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: announce ns1 IP from bird (codfw)

https://gerrit.wikimedia.org/r/964918

Change 963375 merged by jenkins-bot:

[operations/homer/public@master] Add ns0 and ns1 /32 routes to anycast_prefixes list

https://gerrit.wikimedia.org/r/963375

Mentioned in SAL (#wikimedia-operations) [2023-10-11T13:28:42Z] <sukhe> disable puppet on P:bird::anycast: T348041

Change 964918 merged by Ssingh:

[operations/puppet@production] hiera: announce ns1 IP from bird (codfw)

https://gerrit.wikimedia.org/r/964918

Change 965169 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] set anycast4 orlonger instead of longer

https://gerrit.wikimedia.org/r/965169

Change 965169 merged by jenkins-bot:

[operations/homer/public@master] set anycast4 orlonger instead of longer

https://gerrit.wikimedia.org/r/965169

Change 965187 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: announce ns0 IP from bird (eqiad)

https://gerrit.wikimedia.org/r/965187

Mentioned in SAL (#wikimedia-operations) [2023-10-12T13:35:43Z] <sukhe> remove redundant 208.80.153.231/32 from /e/n/i on A:dns-rec and A:codfw (superseded by label lo:anycast): T348041

Mentioned in SAL (#wikimedia-operations) [2023-10-12T16:17:27Z] <sukhe> disable puppet on A:dns-rec to roll out CR: 965187 T348041

Change 965187 merged by Ssingh:

[operations/puppet@production] hiera: announce ns0 IP from bird (eqiad)

https://gerrit.wikimedia.org/r/965187

Mentioned in SAL (#wikimedia-operations) [2023-10-12T16:26:13Z] <sukhe> enable puppet on A:dns-rec and force agent run: T348041

The static routes have been removed and ns[01] are now announced via bird. Thanks to @ayounsi for his help with this!

ssingh claimed this task.

Mentioned in SAL (#wikimedia-operations) [2023-10-13T14:03:53Z] <sukhe> remove redundant 208.80.154.238/32 dev from /e/n/i on A:dns-rec and A:eqiad (superseded by label lo:anycast): T348041