Page MenuHomePhabricator

Remove SLAAC IPs from Ganeti hosts
Open, MediumPublic

Description

Related to T265607 and T244153#6403429:

For example:
https://netbox.wikimedia.org/dcim/devices/2552/
Have a SLAAC IP as primary v6 IP (private) 2620:0:860:101:4ed9:8fff:fea0:222b/64

Running sudo cumin * "ip addr | grep ff:fe | grep -v fe80" only show LVS (expected) and Ganeti (not expected) hosts with those SLAAC IPs.

Not sure if if the fix applied to T265607 covers the private vlan as well. If so feel free to merge. Netbox will have to be updated accordingly once the IP is gone from the host.
Note that disabling RA might cause v6 to lose its default GW.

Event Timeline

ayounsi triaged this task as Medium priority.Oct 19 2020, 1:36 PM
ayounsi created this task.

To partially fix Netbox we can just run the PuppetDB import script for the Ganeti hosts again, then it will just be a matter of deleting all those SLAAC addresses

I fear this ain't gonna be easy. When we tried the approach that exists for all other hosts, we ended up in broken connectivity for ganeti hosts. See T233906 which ended up with the decision described in T233906#5529507. T234207 was then created as an investigation task to handle improvements to our puppetization of network configuration (which is crude and barely existing to be honest). There has been no move in this since then.

The fix for T265607 doesn't cover the private interface and vlan so it won't help here.

I 'd say put for now the Ganeti hosts under "expected", same as the LVS hosts.

Do you think we could trick facter into reporting the non-SLAAC address as primary?

$ sudo facter -p interface_primary
private
$ sudo facter -p ipaddress6_private
2620:0:860:101:4ed9:8fff:fea0:222b

Change 635283 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] facter: ipaddress6 prefer none SLACC addresses

https://gerrit.wikimedia.org/r/635283

Do you think we could trick facter into reporting the non-SLAAC address as primary?

$ sudo facter -p interface_primary
private
$ sudo facter -p ipaddress6_private
2620:0:860:101:4ed9:8fff:fea0:222b

I have created a CR to update the custom ipaddress6 fact however this does not currently update the structured fact $facts['networking']['ip6'] which looks a bit strange

I had a look at upstreaming this to facter v3 but i didn't see an obvious fix and as facter v4 is moving back to ruby i'm not sure its worth the effort to fix this in facter v3

note: I created a bug against facter4 which is related

@jbond thanks for looking into this, unfortunately the data used in the Netbox import comes from the networking fact because it needs all of them and parses that one, so not sure if the above patch would be useful in practice.

@jbond thanks for looking into this, unfortunately the data used in the Netbox import comes from the networking fact because it needs all of them and parses that one, so not sure if the above patch would be useful in practice.

Last time I checked for a tangentially related matter (E_WAYTOOMANYINTERFACE_FACTS), I did not find a way to amend that fact to e.g. not report all tap or veth interfaces. Let me know if that was not correct, but facter blocklist feature did not allow for that finegrained blocking. Could we perhaps solve this in another level? e.g. some filtering in puppetdb?

@jbond thanks for looking into this, unfortunately the data used in the Netbox import comes from the networking fact because it needs all of them and parses that one, so not sure if the above patch would be useful in practice.

Ill take a quick look today to see how to override the structured fact. however we could also add some workarounds in the import script as [i think] we know the following

  • $facts['networking']['primary'] is always correct
  • $facts['networking'['ip'] is always correct
  • $facts['networking']['network6'] should be right as both the SLAAC and the real ip6 address are on the same network
  • Our primary ip6 address should be of the form $network6 + $ip

sorry this is written in the puppet DSL was just easier to quickly test

$binding6 = $facts['networking']['interfaces'][$facts['networking']['primary']]['bindings6'].filter |$binding| {
      $binding['address'] =~ "${facts['networking']['ip']}$"
}
$ip6 = $binding6.empty ? {
  false   => $binding6[0]['address'],
  default => "${facts['networking']['network6']}${facts['networking']['ip']}",

}
notice($ip6)
ganeti1011 ~ % puppet apply ip6.pp
Notice: Scope(Class[main]): 2620:0:861:103:10:64:32:99
Notice: Compiled catalog for ganeti1011.eqiad.wmnet in environment production in 0.02 seconds
Notice: Applied catalog in 0.01 seconds

@jbond yeah I agree we can implement this assuming that all v6 addresses are mapped, with a sane fallback into the networking[ip6] one.
@crusnov could you implement that logic into the current import script? Basically pick the binding6 that endswith primary ipv4.replace('.', ':'), fallbacking into the current behaviour if no match.

Ill take a quick look today to see how to override the structured fact

I had a quick look and i can't work out how to override the networking fact. I seem to be hitting the same issues described here in that i can override the networking fact to something completely new with

Facter.add(:networking) do
  setcode do
    {'foo' => 'bar'}
  end
end

However if i first resolve the networking fact then i cant override.

networking = Facter.fact(:networking).value
Facter.add(:networking) do. # Also tried Facter.add('networking') 
  has_weight 10001. #tried weights of 100, 1001 and 10001
  setcode do
    networking['foo'] = 'bar'
  end
end

Also just to add a comment from @ayounsi on irc. We can test if the $facts['networking']['ip6'] address is a slaac address by checking bytes 23-26 for FF:FE e.g.

$parts = '::' in $networking['ip6'] ? {
  true => $networking['ip6'].split('::')[1].split(':'),  # Address is compressed strip the network part
  default => $networking['ip6'].split(':')[4,-1],      # Address is not compressed strip the network part
}
$primary_is_slaac = ($parts.length == 4 and "${parts[1][2,-1]}${parts[2][0,2]}"  == 'fffe')

this should be much easier and less error prone with pythons ipaddress module

Change 635283 abandoned by Jbond:
[operations/puppet@production] facter: ipaddress6 prefer none SLAAC addresses

Reason:
This was instead fixed in netbox. There is also some upstream work which may make this easier in future https://github.com/puppetlabs/facter/pull/2192

https://gerrit.wikimedia.org/r/635283

note: I created a bug against facter4 which is related

FYI this has been resolved however i need to create an additional BUG report to ask for the attributes to also get exposed by facter

Change 654435 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/software/netbox-extras@master] interface_automation: update is_primary logic.

https://gerrit.wikimedia.org/r/654435

Change 654439 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/software/netbox-extras@master] customscripts/interface_automation: skipp slaac addresses

https://gerrit.wikimedia.org/r/654439

I also created FACT-2907 to request adding binding flags

Change 654439 merged by Jbond:
[operations/software/netbox-extras@master] customscripts/interface_automation: skip slaac addresses

https://gerrit.wikimedia.org/r/654439

Change 654435 merged by Jbond:
[operations/software/netbox-extras@master] interface_automation: update is_primary logic.

https://gerrit.wikimedia.org/r/654435

I have applied the changes in the import script and tested against against ganeti224 and i now see Primary IPv6: 2620:0:860:101:10:192:0:188

Nice! I think that the script deletes ifaces not present in PuppetDB but doesn't do that for IPs... maybe we should also do that. Thoughts?

@Volans, @crusnov, what do you think about changing the status of those IPs from active to SLAAC?
See for example https://netbox-next.wikimedia.org/ipam/ip-addresses/4540/
That would makes some of the Capirca automation easier (it would automatically ignore SLAAC IPs), but I don't know if it risks breaking something else.
And would help keeping track of them as well.

I have applied the changes in the import script and tested against against ganeti224 and i now see Primary IPv6: 2620:0:860:101:10:192:0:188

Nice, is it safe to run it on all (Ganeti) hosts? Some still have the SLAAC IP as v6 primary: https://netbox.wikimedia.org/dcim/devices/2134/

Nice! I think that the script deletes ifaces not present in PuppetDB but doesn't do that for IPs... maybe we should also do that. Thoughts?

+1

@Volans, @crusnov, what do you think about changing the status of those IPs from active to SLAAC?

I think those should not be in Netbox in the first place and we should fix that instead.

I have applied the changes in the import script and tested against against ganeti224 and i now see Primary IPv6: 2620:0:860:101:10:192:0:188

Nice, is it safe to run it on all (Ganeti) hosts? Some still have the SLAAC IP as v6 primary: https://netbox.wikimedia.org/dcim/devices/2134/

Yes, I've took the liberty to do that ( https://netbox.wikimedia.org/extras/changelog/?request_id=16b2db0e-e6d8-464b-be6e-8d1528c1efde )
But apparently we have some hosts that don't have the mapped v6 and have only the SLAAC, scroll through https://netbox.wikimedia.org/dcim/devices/?q=ganeti to see them

Nice! I think that the script deletes ifaces not present in PuppetDB but doesn't do that for IPs... maybe we should also do that. Thoughts?

+1

I'm wondering given that we're skipping them now we could just delete all the existing ones and be done with that because they will not be added anymore.
Unfortunately from a quick test on netbox-next the script is still re-adding some SLAAC IPs and also setting it as primary in some cases:

5	Warning	2620:0:860:103:4ed9:8fff:fe6d:ad20/64 is a SLAAC IP
6	Info	Creating 2620:0:860:103:4ed9:8fff:fe6d:ad20/64
7	Default	[DNS] PTR record not found for 2620:0:860:103:4ed9:8fff:fe6d:ad20
8	Default	[DNS] Record AAAA not found for ganeti2009.codfw.wmnet
9	Warning	ganeti2009.codfw.wmnet assign_name: No IPv6 DNS records.
10	Info	Setting 2620:0:860:103:4ed9:8fff:fe6d:ad20/64 as primary for ganeti2009

We should fix the script IMHO and once we're sure it would not re-add any SLAAC IP just delete all of them from Netbox. Thoughts?

@Volans, @crusnov, what do you think about changing the status of those IPs from active to SLAAC?

I think those should not be in Netbox in the first place and we should fix that instead.

As there is another way to surface them easily (for example through cumin) I'm fine with deleting them from Netbox.

I'm wondering given that we're skipping them now we could just delete all the existing ones and be done with that because they will not be added anymore.
Unfortunately from a quick test on netbox-next the script is still re-adding some SLAAC IPs and also setting it as primary in some cases:

5	Warning	2620:0:860:103:4ed9:8fff:fe6d:ad20/64 is a SLAAC IP
6	Info	Creating 2620:0:860:103:4ed9:8fff:fe6d:ad20/64
7	Default	[DNS] PTR record not found for 2620:0:860:103:4ed9:8fff:fe6d:ad20
8	Default	[DNS] Record AAAA not found for ganeti2009.codfw.wmnet
9	Warning	ganeti2009.codfw.wmnet assign_name: No IPv6 DNS records.
10	Info	Setting 2620:0:860:103:4ed9:8fff:fe6d:ad20/64 as primary for ganeti2009

We should fix the script IMHO and once we're sure it would not re-add any SLAAC IP just delete all of them from Netbox. Thoughts?

+1

We should fix the script IMHO and once we're sure it would not re-add any SLAAC IP just delete all of them from Netbox. Thoughts?

This sounds like the correct solution.

note: I created a bug against facter4 which is related

FYI this has been resolved however i need to create an additional BUG report to ask for the attributes to also get exposed by facter

just an updated that i had a patch accepted which will expose some of the ifa flags. unfortunatly 1) its facter 4 so a long way of 2) it doesn't expose the 32 bit flags and specifically managetempaddr which indicates slaac

https://github.com/puppetlabs/facter/pull/2340/

Change 715949 had a related patch set uploaded (by Jbond; author: John Bond):

[operations/puppet@production] facter networking: filter k8s interfaces out of the networking fact

https://gerrit.wikimedia.org/r/715949

Change 715943 had a related patch set uploaded (by Jbond; author: John Bond):

[operations/puppet@production] facter networking: override the networking.ip6 fact

https://gerrit.wikimedia.org/r/715943

Change 715943 merged by Jbond:

[operations/puppet@production] facter networking: override the networking.ip6 fact

https://gerrit.wikimedia.org/r/715943

Change 715949 merged by Jbond:

[operations/puppet@production] facter networking: filter out cali/tap interfaces

https://gerrit.wikimedia.org/r/715949