Page MenuHomePhabricator

Puppet facts around the primary network interface and IPv4/IPv6 address
Closed, ResolvedPublic

Description

Due to some Puppet/Facter limitations, we currently have at least the following issues and workarounds:

  1. The ssh module uses the ipaddress6 fact to export the hosts' IPv6 address, but in some cases (cp*, authns) this is a service iP bound on the loopback interface. This results in SSH known hosts authentication failures over IPv6.
  2. On some hosts, like e.g. copper, that run docker, $ipaddress defaults to the 172.16.0.0/12 IP that the docker0 interface has, resulting in all kinds of weird behavior across the tree.
  3. Probably because of the above, realm.pp exports a puppet-global variable $main_ipaddress (but not $main_ipaddress6), which is set to the the first fact that is non-undef from (ipaddress_bond0, ipaddress_eth0, ipaddress). This is used very seldomly, in a couple of places in the tree.
  4. interface::add_ip6_mapped has code to figure out the "primary" interface, by taking the first out of the interfaces fact.
  5. Because "the first interface" this is not always the right one (and probably because of copy/paste), we have 79 definitions of interface::add_ip6_mapped that explicitly pass eth0 as the argument.
  6. Finally, over at T158429 we have been discussing switching to systemd/stretch's predictable interface names, which means that eth0 may soon not be the primary and in turn means that both these interface::add_ip6_mapped definitions and $main_ipaddress will be wrong.

For these reasons, I recently pushed abf0e49c32acbf99a993e47dd482e4d194d23318 which created these three facts:

  • interface_primary: the interface that is being used to reach the default gateway (ip -4 route list 0/0). Inspired and named out of Facter 3's networking['primary'] fact. I'd name it the same for forward-compatibility but unfortunately we don't have Puppet 3.8's structured facts enabled :(
  • ipaddress_primary: basically an alias for $ipaddress_${interface_primary}.
  • ipaddress6_primary: Ditto.

These are not being used yet mainly due to lack of testing.

So, to fix the ipaddress/6 situation:

  • Compare $ipaddress(6) with $ipaddress(6)_primary and $main_ipaddress across the fleet and audit manually all the differences. Fix any $ipaddress_primary bugs found during the audit.
  • Rename ipaddress(6)_primary to ipaddress(6) with weight 100 (using Facter's precedence rules) [change merged]
  • Replace $main_ipaddress by $::ipaddress (or $facts[$ipaddress]) across the tree. [change staged]
  • Remove the hardcoded ipaddress_eth0 and ipaddress6_eth0 calls across the tree.

And to fix the interface_primary situation:

  • Audit whether $interface_primary is the same to the first interface of $interfaces for all the hosts where add_ip6_mapped calls is applied without an explit interface parameter passed.
  • Change add_ip6_mapped to use interface_primary and ipaddress [change staged]
  • Remove the explicit interface parameter from all the add_ip6_mapped calls where is redundant i.e. where interface_primary equals the parameter, which is (almost?) always set to eth0.

Event Timeline

Comparison beween ipaddress and ipaddress_primary, for all the different ones the correct one seems to be ipaddress_primary to me, it matches also the DNS record for the host:

hostipaddressipaddress_primary
copper.eqiad.wmnet172.17.0.110.64.16.176
kubernetes1001.eqiad.wmnet192.168.31.110.64.0.121
kubernetes1002.eqiad.wmnet192.168.32.110.64.16.75
kubernetes1003.eqiad.wmnet192.168.33.110.64.32.23
kubernetes1004.eqiad.wmnet192.168.34.110.64.48.52
labnet1001.eqiad.wmnet10.68.16.110.64.20.13
labtestnet2001.codfw.wmnet10.196.16.110.192.20.5

Query used to extract the raw data on nitrogen (PuppetDB database), filtered with facts updated in the last 45 days to be sure that the *_primary facts are there avoiding noise from stale data from old hosts:

SELECT s.certname, v.value_string, COUNT(*) AS count FROM factsets s LEFT JOIN facts f ON s.id = f.factset_id LEFT JOIN fact_paths p ON p.id = f.fact_path_id LEFT JOIN fact_values v ON v.id = f.fact_value_id WHERE p.name IN ('ipaddress', 'ipaddress_primary') AND s.timestamp > NOW() - INTERVAL '45 days' GROUP BY s.certname, v.value_string HAVING COUNT(*) < 2 ORDER BY s.certname;

Query used to extract all the details of a single host:

SELECT p.name, v.value_string FROM factsets s LEFT JOIN facts f ON s.id = f.factset_id LEFT JOIN fact_paths p ON p.id = f.fact_path_id LEFT JOIN fact_values v ON v.id = f.fact_value_id WHERE p.name IN ('ipaddress', 'ipaddress_primary', 'ipaddress6', 'ipaddress6_primary') AND s.certname = 'copper.eqiad.wmnet' ORDER BY v.value_string;

Comparison beween ipaddress6 and ipaddress6_primary. All the ones where there is some issue are marked in bold and have a number in square brakects that is referred in the list of details at the bottom. For all the others the correct one seems to be ipaddress6_primary to me, it matches also the DNS record when present:

hostipaddress6ipaddress6_primary
acamar.wikimedia.org2620:0:860:ed1a::3:fe2620:0:860:1:208:80:153:12
achernar.wikimedia.org2620:0:860:ed1a::3:fe2620:0:860:2:208:80:153:42
baham.wikimedia.org2620:0:862:ed1a::e2620:0:860:1:208:80:153:13
chromium.wikimedia.org2620:0:861:ed1a::3:fe2620:0:861:2:208:80:154:157
cobalt.wikimedia.org [1]2620:0:861:3:208:80:154:852620:0:861:3:1618:77ff:fe33:4a30
cp1008.wikimedia.org2620:0:861:ed1a::12620:0:861:1:208:80:154:42
cp1045.eqiad.wmnet2620:0:861:ed1a::3:d2620:0:861:103:10:64:32:97
cp1046.eqiad.wmnet2620:0:861:ed1a::2:d2620:0:861:103:10:64:32:98
cp1047.eqiad.wmnet2620:0:861:ed1a::2:d2620:0:861:103:10:64:32:99
cp1048.eqiad.wmnet2620:0:861:ed1a::2:b2620:0:861:103:10:64:32:100
cp1049.eqiad.wmnet2620:0:861:ed1a::2:b2620:0:861:103:10:64:32:101
cp1050.eqiad.wmnet2620:0:861:ed1a::2:b2620:0:861:103:10:64:32:102
cp1051.eqiad.wmnet2620:0:861:ed1a::3:d2620:0:861:103:10:64:32:103
cp1052.eqiad.wmnet2620:0:861:ed1a::12620:0:861:103:10:64:32:104
cp1053.eqiad.wmnet2620:0:861:ed1a::12620:0:861:103:10:64:32:105
cp1054.eqiad.wmnet2620:0:861:ed1a::12620:0:861:103:10:64:32:106
cp1055.eqiad.wmnet2620:0:861:ed1a::12620:0:861:103:10:64:32:107
cp1058.eqiad.wmnet2620:0:861:ed1a::3:d2620:0:861:101:10:64:0:95
cp1059.eqiad.wmnet2620:0:861:ed1a::2:d2620:0:861:101:10:64:0:96
cp1060.eqiad.wmnet2620:0:861:ed1a::2:d2620:0:861:101:10:64:0:97
cp1061.eqiad.wmnet2620:0:861:ed1a::3:d2620:0:861:101:10:64:0:98
cp1062.eqiad.wmnet2620:0:861:ed1a::2:b2620:0:861:101:10:64:0:99
cp1063.eqiad.wmnet2620:0:861:ed1a::2:b2620:0:861:101:10:64:0:100
cp1064.eqiad.wmnet2620:0:861:ed1a::2:b2620:0:861:101:10:64:0:101
cp1065.eqiad.wmnet2620:0:861:ed1a::12620:0:861:101:10:64:0:102
cp1066.eqiad.wmnet2620:0:861:ed1a::12620:0:861:101:10:64:0:103
cp1067.eqiad.wmnet2620:0:861:ed1a::12620:0:861:101:10:64:0:104
cp1068.eqiad.wmnet2620:0:861:ed1a::12620:0:861:101:10:64:0:105
cp1071.eqiad.wmnet2620:0:861:ed1a::2:b2620:0:861:107:10:64:48:105
cp1072.eqiad.wmnet2620:0:861:ed1a::2:b2620:0:861:107:10:64:48:106
cp1073.eqiad.wmnet2620:0:861:ed1a::2:b2620:0:861:107:10:64:48:107
cp1074.eqiad.wmnet2620:0:861:ed1a::2:b2620:0:861:107:10:64:48:108
cp1099.eqiad.wmnet2620:0:861:ed1a::2:b2620:0:861:103:10:64:32:81
cp2001.codfw.wmnet2620:0:860:ed1a::12620:0:860:101:10:192:0:122
cp2002.codfw.wmnet2620:0:860:ed1a::2:b2620:0:860:101:10:192:0:123
cp2003.codfw.wmnet2620:0:860:ed1a::2:d2620:0:860:101:10:192:0:124
cp2004.codfw.wmnet2620:0:860:ed1a::12620:0:860:101:10:192:0:125
cp2005.codfw.wmnet2620:0:860:ed1a::2:b2620:0:860:101:10:192:0:126
cp2006.codfw.wmnet2620:0:860:ed1a::3:d2620:0:860:101:10:192:0:127
cp2007.codfw.wmnet2620:0:860:ed1a::12620:0:860:102:10:192:16:133
cp2008.codfw.wmnet2620:0:860:ed1a::2:b2620:0:860:102:10:192:16:134
cp2009.codfw.wmnet2620:0:860:ed1a::2:d2620:0:860:102:10:192:16:135
cp2010.codfw.wmnet2620:0:860:ed1a::12620:0:860:102:10:192:16:136
cp2011.codfw.wmnet2620:0:860:ed1a::2:b2620:0:860:102:10:192:16:137
cp2012.codfw.wmnet2620:0:860:ed1a::3:d2620:0:860:102:10:192:16:138
cp2013.codfw.wmnet2620:0:860:ed1a::12620:0:860:103:10:192:32:112
cp2014.codfw.wmnet2620:0:860:ed1a::2:b2620:0:860:103:10:192:32:113
cp2015.codfw.wmnet2620:0:860:ed1a::2:d2620:0:860:103:10:192:32:114
cp2016.codfw.wmnet2620:0:860:ed1a::12620:0:860:103:10:192:32:115
cp2017.codfw.wmnet2620:0:860:ed1a::2:b2620:0:860:103:10:192:32:116
cp2018.codfw.wmnet2620:0:860:ed1a::3:d2620:0:860:103:10:192:32:117
cp2019.codfw.wmnet2620:0:860:ed1a::12620:0:860:104:10:192:48:23
cp2020.codfw.wmnet2620:0:860:ed1a::2:b2620:0:860:104:10:192:48:24
cp2021.codfw.wmnet2620:0:860:ed1a::2:d2620:0:860:104:10:192:48:25
cp2022.codfw.wmnet2620:0:860:ed1a::2:b2620:0:860:104:10:192:48:26
cp2023.codfw.wmnet2620:0:860:ed1a::12620:0:860:104:10:192:48:27
cp2024.codfw.wmnet2620:0:860:ed1a::2:b2620:0:860:104:10:192:48:28
cp2025.codfw.wmnet2620:0:860:ed1a::3:d2620:0:860:104:10:192:48:29
cp2026.codfw.wmnet2620:0:860:ed1a::2:b2620:0:860:104:10:192:48:30
cp3003.esams.wmnet2620:0:862:ed1a::2:d2620:0:862:102:10:20:0:103
cp3004.esams.wmnet2620:0:862:ed1a::2:d2620:0:862:102:10:20:0:104
cp3005.esams.wmnet2620:0:862:ed1a::2:d2620:0:862:102:10:20:0:105
cp3006.esams.wmnet2620:0:862:ed1a::2:d2620:0:862:102:10:20:0:106
cp3007.esams.wmnet2620:0:862:ed1a::3:d2620:0:862:102:10:20:0:107
cp3008.esams.wmnet2620:0:862:ed1a::3:d2620:0:862:102:10:20:0:108
cp3010.esams.wmnet2620:0:862:ed1a::3:d2620:0:862:102:10:20:0:110
cp3030.esams.wmnet2620:0:862:ed1a::12620:0:862:102:10:20:0:165
cp3031.esams.wmnet2620:0:862:ed1a::12620:0:862:102:10:20:0:166
cp3032.esams.wmnet2620:0:862:ed1a::12620:0:862:102:10:20:0:167
cp3033.esams.wmnet2620:0:862:ed1a::12620:0:862:102:10:20:0:168
cp3034.esams.wmnet2620:0:862:ed1a::2:b2620:0:862:102:10:20:0:169
cp3035.esams.wmnet2620:0:862:ed1a::2:b2620:0:862:102:10:20:0:170
cp3036.esams.wmnet2620:0:862:ed1a::2:b2620:0:862:102:10:20:0:171
cp3037.esams.wmnet2620:0:862:ed1a::2:b2620:0:862:102:10:20:0:172
cp3038.esams.wmnet2620:0:862:ed1a::2:b2620:0:862:102:10:20:0:173
cp3039.esams.wmnet2620:0:862:ed1a::2:b2620:0:862:102:10:20:0:174
cp3040.esams.wmnet2620:0:862:ed1a::12620:0:862:102:10:20:0:175
cp3041.esams.wmnet2620:0:862:ed1a::12620:0:862:102:10:20:0:176
cp3042.esams.wmnet2620:0:862:ed1a::12620:0:862:102:10:20:0:177
cp3043.esams.wmnet2620:0:862:ed1a::12620:0:862:102:10:20:0:178
cp3044.esams.wmnet2620:0:862:ed1a::2:b2620:0:862:102:10:20:0:179
cp3045.esams.wmnet2620:0:862:ed1a::2:b2620:0:862:102:10:20:0:180
cp3046.esams.wmnet2620:0:862:ed1a::2:b2620:0:862:102:10:20:0:181
cp3047.esams.wmnet2620:0:862:ed1a::2:b2620:0:862:102:10:20:0:182
cp3048.esams.wmnet2620:0:862:ed1a::2:b2620:0:862:102:10:20:0:183
cp3049.esams.wmnet2620:0:862:ed1a::2:b2620:0:862:102:10:20:0:184
cp4001.ulsfo.wmnet2620:0:863:ed1a::3:d2620:0:863:101:10:128:0:101
cp4002.ulsfo.wmnet2620:0:863:ed1a::3:d2620:0:863:101:10:128:0:102
cp4003.ulsfo.wmnet2620:0:863:ed1a::3:d2620:0:863:101:10:128:0:103
cp4004.ulsfo.wmnet2620:0:863:ed1a::3:d2620:0:863:101:10:128:0:104
cp4005.ulsfo.wmnet2620:0:863:ed1a::2:b2620:0:863:101:10:128:0:105
cp4006.ulsfo.wmnet2620:0:863:ed1a::2:b2620:0:863:101:10:128:0:106
cp4007.ulsfo.wmnet2620:0:863:ed1a::2:b2620:0:863:101:10:128:0:107
cp4008.ulsfo.wmnet2620:0:863:ed1a::12620:0:863:101:10:128:0:108
cp4009.ulsfo.wmnet2620:0:863:ed1a::12620:0:863:101:10:128:0:109
cp4010.ulsfo.wmnet2620:0:863:ed1a::12620:0:863:101:10:128:0:110
cp4011.ulsfo.wmnet2620:0:863:ed1a::2:d2620:0:863:101:10:128:0:111
cp4012.ulsfo.wmnet2620:0:863:ed1a::2:d2620:0:863:101:10:128:0:112
cp4013.ulsfo.wmnet2620:0:863:ed1a::2:b2620:0:863:101:10:128:0:113
cp4014.ulsfo.wmnet2620:0:863:ed1a::2:b2620:0:863:101:10:128:0:114
cp4015.ulsfo.wmnet2620:0:863:ed1a::2:b2620:0:863:101:10:128:0:115
cp4016.ulsfo.wmnet2620:0:863:ed1a::12620:0:863:101:10:128:0:116
cp4017.ulsfo.wmnet2620:0:863:ed1a::12620:0:863:101:10:128:0:117
cp4018.ulsfo.wmnet2620:0:863:ed1a::12620:0:863:101:10:128:0:118
cp4019.ulsfo.wmnet2620:0:863:ed1a::2:d2620:0:863:101:10:128:0:119
cp4020.ulsfo.wmnet2620:0:863:ed1a::2:d2620:0:863:101:10:128:0:120
eeden.wikimedia.org2620:0:862:ed1a::e2620:0:862:1:91:198:174:121
fermium.wikimedia.org2620:0:861:3::22620:0:861:3:208:80:154:74
ganeti1001.eqiad.wmnet [2]2620:0:861:3:f21f:afff:fee8:c5a32620:0:861:103:f21f:afff:fee8:c5a3
ganeti1002.eqiad.wmnet [2]2620:0:861:3:f21f:afff:fee8:c1872620:0:861:103:f21f:afff:fee8:c187
ganeti1003.eqiad.wmnet [2]2620:0:861:3:f21f:afff:fee8:ab4a2620:0:861:103:f21f:afff:fee8:ab4a
ganeti1004.eqiad.wmnet [2]2620:0:861:3:f21f:afff:fee6:68c72620:0:861:103:f21f:afff:fee6:68c7
ganeti2001.codfw.wmnet [2]2620:0:860:2:46a8:42ff:fe12:c8fe2620:0:860:102:46a8:42ff:fe12:c8fe
ganeti2002.codfw.wmnet [2]2620:0:860:2:46a8:42ff:fe12:ce752620:0:860:102:46a8:42ff:fe12:ce75
ganeti2003.codfw.wmnet [2]2620:0:860:2:46a8:42ff:fe12:dc5a2620:0:860:102:46a8:42ff:fe12:dc5a
ganeti2004.codfw.wmnet [2]2620:0:860:2:46a8:42ff:fe12:cd2b2620:0:860:102:46a8:42ff:fe12:cd2b
ganeti2005.codfw.wmnet [2]2620:0:860:2:46a8:42ff:fe12:cfc52620:0:860:102:46a8:42ff:fe12:cfc5
ganeti2006.codfw.wmnet [2]2620:0:860:2:46a8:42ff:fe12:cce32620:0:860:102:46a8:42ff:fe12:cce3
hydrogen.wikimedia.org2620:0:861:ed1a::3:fe2620:0:861:1:208:80:154:50
iridium.eqiad.wmnet2620:0:861:ed1a::3:162620:0:861:103:10:64:32:186
lvs1001.wikimedia.org2620:0:861:ed1a::12620:0:861:1:208:80:154:55
lvs1002.wikimedia.org2620:0:861:ed1a::2:b2620:0:861:1:208:80:154:56
lvs1003.wikimedia.org2620:0:861:107:1a03:73ff:fef0:8b0d2620:0:861:1:208:80:154:57
lvs1004.wikimedia.org2620:0:861:ed1a::12620:0:861:2:208:80:154:137
lvs1005.wikimedia.org2620:0:861:ed1a::2:b2620:0:861:2:208:80:154:138
lvs1006.wikimedia.org2620:0:861:107:1a03:73ff:fef0:8a7d2620:0:861:2:208:80:154:139
lvs1007.eqiad.wmnet2620:0:861:ed1a::12620:0:861:101:10:64:1:7
lvs1008.eqiad.wmnet2620:0:861:ed1a::2:b2620:0:861:101:10:64:1:8
lvs1009.eqiad.wmnet2620:0:861:4:8edc:d4ff:fe0c:99cc2620:0:861:101:10:64:1:9
lvs1010.eqiad.wmnet2620:0:861:ed1a::12620:0:861:103:10:64:33:10
lvs1011.eqiad.wmnet2620:0:861:ed1a::2:b2620:0:861:103:10:64:33:11
lvs1012.eqiad.wmnet2620:0:861:4:8edc:d4ff:fe0c:99ec2620:0:861:103:10:64:33:12
lvs2001.codfw.wmnet2620:0:860:ed1a::12620:0:860:101:10:192:1:1
lvs2002.codfw.wmnet2620:0:860:ed1a::3:d2620:0:860:101:10:192:1:2
lvs2003.codfw.wmnet2620:0:860:104:42a8:f0ff:fe2c:316c2620:0:860:101:10:192:1:3
lvs2004.codfw.wmnet2620:0:860:ed1a::12620:0:860:102:10:192:17:4
lvs2005.codfw.wmnet2620:0:860:ed1a::3:d2620:0:860:102:10:192:17:5
lvs2006.codfw.wmnet2620:0:860:104:42a8:f0ff:fe2c:67cc2620:0:860:102:10:192:17:6
lvs3001.esams.wmnet2620:0:862:ed1a::12620:0:862:102:10:20:0:11
lvs3002.esams.wmnet2620:0:862:ed1a::3:d2620:0:862:102:10:20:0:12
lvs3003.esams.wmnet2620:0:862:ed1a::12620:0:862:102:10:20:0:13
lvs3004.esams.wmnet2620:0:862:ed1a::3:d2620:0:862:102:10:20:0:14
lvs4001.ulsfo.wmnet2620:0:863:ed1a::12620:0:863:101:10:128:0:11
lvs4002.ulsfo.wmnet2620:0:863:ed1a::2:b2620:0:863:101:10:128:0:12
lvs4003.ulsfo.wmnet2620:0:863:ed1a::12620:0:863:101:10:128:0:13
lvs4004.ulsfo.wmnet2620:0:863:ed1a::2:b2620:0:863:101:10:128:0:14
maerlant.wikimedia.org2620:0:862:ed1a::3:fe2620:0:862:1:91:198:174:122
ms-be1031.eqiad.wmnet [3]-2620:0:861:102:9618:82ff:fe80:1ef4
ms-be1032.eqiad.wmnet [3]-2620:0:861:102:9618:82ff:fe80:9694
mx1001.wikimedia.org [4]2620:0:861:3:208:80:154:762620:0:861:3:208:80:154:91
mx2001.wikimedia.org [4]2620:0:860:2:208:80:153:452620:0:860:2:208:80:153:46
nescio.wikimedia.org2620:0:862:ed1a::3:fe2620:0:862:1:91:198:174:106
phab2001.codfw.wmnet [5]2620:0:861:103:10:64:32:1862620:0:860:103:10:192:32:149
radon.wikimedia.org2620:0:862:ed1a::e2620:0:861:3:208:80:154:93
  • [1] cobalt.wikimedia.org:

The ipaddress6_primary is correct but the puppetization of cobalt is missing the interface::add_ip6_mapped class to map it's hosts's IPv6. The one mapped 2620:0:861:3:208:80:154:85 is the one for the gerrit service.

  • [2] ganeti* hosts:

The one matched by ipaddress6_primary is the correct one being the one of the br0 interface, while the one matched by ipaddress6 is the one of the vlan1003 interface, that according to @akosiaris should not have an IPv6, it doesn't either have an IPv4.

  • [3] ms-be103[12].eqiad.wmnet hosts:

The ipaddress6_primary is correct, the strange thing is that only for those 2 hosts facter is not printing any ipaddress6 fact, while it has a ipaddress6_eth0 => 2620:0:861:102:9618:82ff:fe80:1ef4 fact with the right IPv6. Other hosts of the same cluster have all IP correctly detected. Needs further investigation to undersand why facter is failing but the ipaddress6_primary is correct anyway.

  • [4] mx[12]001* hosts:

This is the only case where ipaddress6_primary matches the WRONG IPv6. The right one being the one matched by ipaddress6, see below for mx1001, for mx2001 the situation is analogues.

# From facter via PuppetDB (with dns comments):
ipaddress6         | 2620:0:861:3:208:80:154:76 (dns: mx1001)
ipaddress6_eth0    | 2620:0:861:3:208:80:154:91 (dns: wiki-mail-eqiad)
ipaddress6_primary | 2620:0:861:3:208:80:154:91 (dns: wiki-mail-eqiad)

# From ip addr
inet 208.80.154.76/26 brd 208.80.154.127 scope global eth0
   valid_lft forever preferred_lft forever
inet 208.80.154.91/32 scope global eth0
   valid_lft forever preferred_lft forever
inet6 2620:0:861:3:208:80:154:91/128 scope global deprecated
   valid_lft forever preferred_lft 0sec
inet6 2620:0:861:3:208:80:154:76/64 scope global
   valid_lft 2591976sec preferred_lft 604776sec
inet6 fe80::a800:ff:feb2:f5b7/64 scope link
   valid_lft forever preferred_lft forever

$ sudo ip -6 route show
2620:0:861:3:208:80:154:91 dev eth0  proto kernel  metric 256
2620:0:861:3::/64 dev eth0  proto kernel  metric 256
fe80::/64 dev eth0  proto kernel  metric 256
default via fe80::1 dev eth0  proto ra  metric 1024  expires 573sec hoplimit 64

$ sudo route -n -6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
::1/128                        ::                         U    256 0     0 lo
2620:0:861:3:208:80:154:91/128 ::                         U    256 1     2 eth0
2620:0:861:3::/64              ::                         U    256 81690586 eth0
fe80::/64                      ::                         U    256 8   831 eth0
::/0                           fe80::1                    UGDAe 1024 830578393 eth0
::/0                           ::                         !n   -1  132271748 lo
::1/128                        ::                         Un   0   93808681 lo
2620:0:861:3:208:80:154:76/128 ::                         Un   0   914651819 lo
2620:0:861:3:208:80:154:91/128 ::                         Un   0   220113763 lo
fe80::a800:ff:feb2:f5b7/128    ::                         Un   0   2115892 lo
ff00::/8                       ::                         U    256 11059391 eth0
::/0                           ::                         !n   -1  132271748 lo
  • [5] phab2001.codfw.wmnet:

This is the only case in which ipaddress6 AND ipaddress6_primary are BOTH WRONG.

  1. phab2001 has the IPv6 of iridium (2620:0:861:103:10:64:32:186/128) that is present on both hosts! See also the related change.
  2. The IPv6 matched by ipaddress6_primary is the one of the service phab2001-vcs (2620:0:860:103:10:192:32:149) and not the one of the host (2620:0:860:103:10:192:32:147).
# From sudo ip addr
inet 10.192.32.147/22 brd 10.192.35.255 scope global eth0
   valid_lft forever preferred_lft forever
inet 10.192.32.149/21 scope global eth0
   valid_lft forever preferred_lft forever
inet6 2620:0:860:103:10:192:32:149/128 scope global deprecated
   valid_lft forever preferred_lft 0sec
inet6 2620:0:860:103:10:192:32:147/64 scope global
   valid_lft 2591991sec preferred_lft 604791sec
inet6 2620:0:861:103:10:64:32:186/128 scope global deprecated
   valid_lft forever preferred_lft 0sec
inet6 fe80::1618:77ff:fe5b:cfa/64 scope link
   valid_lft forever preferred_lft forever

$ sudo ip -6 route show
2620:0:860:103:10:192:32:149 dev eth0  proto kernel  metric 256
2620:0:860:103::/64 dev eth0  proto kernel  metric 256
2620:0:861:103:10:64:32:186 dev eth0  proto kernel  metric 256
fe80::/64 dev eth0  proto kernel  metric 256
default via fe80::1 dev eth0  proto ra  metric 1024  expires 587sec hoplimit 64

$ sudo route -n -6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
::1/128                        ::                         U    256 0     0 lo
2620:0:860:103:10:192:32:149/128 ::                         U    256 0     0 eth0
2620:0:860:103::/64            ::                         U    256 0     0 eth0
2620:0:861:103:10:64:32:186/128 ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           fe80::1                    UGDAe 1024 321272684 eth0
::/0                           ::                         !n   -1  11272804 lo
::1/128                        ::                         Un   0   333189699 lo
2620:0:860:103:10:192:32:147/128 ::                         Un   0   21278173 lo
2620:0:860:103:10:192:32:149/128 ::                         Un   0   1     0 lo
2620:0:861:103:10:64:32:186/128 ::                         Un   0   1     0 lo
fe80::1618:77ff:fe5b:cfa/128   ::                         Un   0   2 82532 lo
ff00::/8                       ::                         U    256 12495799 eth0
::/0                           ::                         !n   -1  11272804 lo

For reference, this is the query used to extract the raw data on nitrogen (PuppetDB database), filtered with facts updated in the last 45 days to be sure that the *_primary facts are there avoiding noise from stale data from old hosts:

SELECT s.certname, v.value_string, COUNT(*) AS count FROM factsets s LEFT JOIN facts f ON s.id = f.factset_id LEFT JOIN fact_paths p ON p.id = f.fact_path_id LEFT JOIN fact_values v ON v.id = f.fact_value_id WHERE p.name IN ('ipaddress6', 'ipaddress6_primary') AND s.timestamp > NOW() - INTERVAL '45 days' GROUP BY s.certname, v.value_string HAVING COUNT(*) < 2 ORDER BY s.certname;

Thanks for doing all this work @Volans :)

  • [1] cobalt.wikimedia.org:

The ipaddress6_primary is correct but the puppetization of cobalt is missing the interface::add_ip6_mapped class to map it's hosts's IPv6. The one mapped 2620:0:861:3:208:80:154:85 is the one for the gerrit service.

Fixed with e963e1c739d020609031d933f35c212e3cb656c4. Now ipaddress6_primary is correct, but ipaddress6 still isn't :)

  • [2] ganeti* hosts:

The one matched by ipaddress6_primary is the correct one being the one of the br0 interface, while the one matched by ipaddress6 is the one of the vlan1003 interface, that according to @akosiaris should not have an IPv6, it doesn't either have an IPv4.

vlan1003 is a bridge containing a VLAN. It doesn't have an IPv4 address and shouldn't have an IPv6 either, so IPv6 autoconf needs to be turned off for that interface. In any case: the new fact is correct, the old one isn't :)

  • [3] ms-be103[12].eqiad.wmnet hosts:

The ipaddress6_primary is correct, the strange thing is that only for those 2 hosts facter is not printing any ipaddress6 fact, while it has a ipaddress6_eth0 => 2620:0:861:102:9618:82ff:fe80:1ef4 fact with the right IPv6. Other hosts of the same cluster have all IP correctly detected. Needs further investigation to undersand why facter is failing but the ipaddress6_primary is correct anyway.

Puppet bug… ipaddress6's source says unless match =~ /fe80.*/ or match == "::1". It doesn't bind the regexp to ^fe80 (to filter out link-locals), and it just happened to be that these IPv6s contained fe80… It doesn't matter, we're getting rid of that fat anyway.

  • [4] mx[12]001* hosts:

This is the only case where ipaddress6_primary matches the WRONG IPv6. The right one being the one matched by ipaddress6, see below for mx1001, for mx2001 the situation is analogues.

This is probably because of this:

# mark as deprecated = never pick this address unless explicitly asked
options   => 'preferred_lft 0',

ipaddress6_primary relies on ipaddress6_eth0, so the bug is there. Not entirely sure how to properly fix this, perhaps by rewiring ipaddress6_primary to have its own logic, or by fixing mx's IPv6 config in some other way.

  • [5] phab2001.codfw.wmnet:

This is the only case in which ipaddress6 AND ipaddress6_primary are BOTH WRONG.

  1. phab2001 has the IPv6 of iridium (2620:0:861:103:10:64:32:186/128) that is present on both hosts! See also the related change.

This is an artifact of our puppet code not properly managing addresses (i.e. not removing an address when the otherwise declarative stanza is removed). I removed this manually now. This was an actual misconfiguration, not a fact bug.

Now the output of ipaddress6_primary is the output phab2001-vcs, for the same (preferred_lft) reason as of mx1001/mx2001.

Change 350238 had a related patch set uploaded (by Faidon Liambotis):
[operations/puppet@production] Fix ipaddress6_primary to ignore deprecated addresses

https://gerrit.wikimedia.org/r/350238

Change 350238 merged by Faidon Liambotis:
[operations/puppet@production] Fix ipaddress6_primary to ignore deprecated addresses

https://gerrit.wikimedia.org/r/350238

Change 345569 had a related patch set uploaded (by Faidon Liambotis):
[operations/puppet@production] Replace $::main_ipaddress by the new ipaddress fact

https://gerrit.wikimedia.org/r/345569

Change 345568 had a related patch set uploaded (by Faidon Liambotis):
[operations/puppet@production] Switch add_ip6_mapped to use interface_primary

https://gerrit.wikimedia.org/r/345568

Change 350254 had a related patch set uploaded (by Faidon Liambotis):
[operations/puppet@production] Rename ipaddress_primary to ipaddress (same for 6)

https://gerrit.wikimedia.org/r/350254

All of the afore-mentioned issues should be fixed with the latest patches above. I've also tested Facter's precedence rules (they work!) and staged commits to remove the _primary suffix from ipaddress/ipaddress6, as well as deprecate $::main_ipaddress.

Next steps:

  • Redo the ipaddress6/ipaddress6_primary audit, since the new ipaddress6_primary is significantly changed :(
  • Puppet compiler diff r345569 and merge that.

From the audit I got the same results of the tables in T163196#3206314 except the following ones, and all looks good now for the ipaddress6_primary version:

hostipaddress6ipaddress6_primary
cobalt.wikimedia.org2620:0:861:3:208:80:154:852620:0:861:3:208:80:154:81
mx1001.wikimedia.org2620:0:861:3:208:80:154:762620:0:861:3:208:80:154:76
mx2001.wikimedia.org2620:0:860:2:208:80:153:452620:0:860:2:208:80:153:45
phab2001.codfw.wmnet2620:0:860:103:10:192:32:1472620:0:860:103:10:192:32:147
Volans updated the task description. (Show Details)

Change 350254 merged by Faidon Liambotis:
[operations/puppet@production] Rename ipaddress_primary to ipaddress (same for 6)

https://gerrit.wikimedia.org/r/350254

Change 350765 had a related patch set uploaded (by Faidon Liambotis; owner: Faidon Liambotis):
[operations/puppet@production] lvs: replace $::ipaddress_eth0 by $::ipaddress

https://gerrit.wikimedia.org/r/350765

Change 350766 had a related patch set uploaded (by Faidon Liambotis; owner: Faidon Liambotis):
[operations/puppet@production] dnsrecursor: use ipaddress6, not ipaddress6_eth0

https://gerrit.wikimedia.org/r/350766

Change 350767 had a related patch set uploaded (by Faidon Liambotis; owner: Faidon Liambotis):
[operations/puppet@production] labs: remove the _eth0 suffix from ipaddress facts

https://gerrit.wikimedia.org/r/350767

Change 350768 had a related patch set uploaded (by Faidon Liambotis; owner: Faidon Liambotis):
[operations/puppet@production] Remove c/p interface argument to add_ip6_mapped

https://gerrit.wikimedia.org/r/350768

Change 350770 had a related patch set uploaded (by Faidon Liambotis; owner: Faidon Liambotis):
[operations/puppet@production] interface/lvs: add an $interface parameter, remove hardcoded eth0

https://gerrit.wikimedia.org/r/350770

Change 350771 had a related patch set uploaded (by Faidon Liambotis; owner: Faidon Liambotis):
[operations/puppet@production] cache: use interface_primary instead of eth0

https://gerrit.wikimedia.org/r/350771

Change 345569 merged by Volans:
[operations/puppet@production] Replace $::main_ipaddress by the new ipaddress fact

https://gerrit.wikimedia.org/r/345569

Change 350765 merged by Volans:
[operations/puppet@production] lvs: replace $::ipaddress_eth0 by $::ipaddress

https://gerrit.wikimedia.org/r/350765

Change 350766 merged by Volans:
[operations/puppet@production] dnsrecursor: use ipaddress6, not ipaddress6_eth0

https://gerrit.wikimedia.org/r/350766

Change 353250 had a related patch set uploaded (by Volans; owner: Volans):
[operations/puppet@production] LVS: move pybal config to a separate class

https://gerrit.wikimedia.org/r/353250

Change 353250 merged by Volans:
[operations/puppet@production] LVS: move pybal config to a separate class

https://gerrit.wikimedia.org/r/353250

Change 350767 merged by Volans:
[operations/puppet@production] labs: remove the _eth0 suffix from ipaddress facts

https://gerrit.wikimedia.org/r/350767

Change 345568 merged by Volans:
[operations/puppet@production] Switch add_ip6_mapped to use interface_primary

https://gerrit.wikimedia.org/r/345568

Change 350768 merged by Volans:
[operations/puppet@production] Remove c/p interface argument to add_ip6_mapped

https://gerrit.wikimedia.org/r/350768

Change 353332 had a related patch set uploaded (by Volans; owner: Volans):
[operations/puppet@production] Interface: remove unused module

https://gerrit.wikimedia.org/r/353332

Change 353332 merged by Volans:
[operations/puppet@production] interface: remove unused definition ::offload

https://gerrit.wikimedia.org/r/353332

Change 350770 merged by Volans:
[operations/puppet@production] interface/lvs: add an $interface parameter, remove hardcoded eth0

https://gerrit.wikimedia.org/r/350770

Change 350771 merged by Volans:
[operations/puppet@production] cache: use interface_primary instead of eth0

https://gerrit.wikimedia.org/r/350771

faidon updated the task description. (Show Details)

I think all of the changes described here have been merged ­-- right @Volans? If you agree, want to do the honors of resolving this?

Volans assigned this task to faidon.