Page MenuHomePhabricator

Enable IPv6 for Wikidough
Closed, ResolvedPublic

Description

We've been allocated an IPv6 RIR assignment (2001:67c:930::/48) for Wikidough (. There are several steps required to complete the process. Eventual deployment will be on a POP by POP basis, starting with esams.

  • Assign the global Anycast IPv6 address for Wikidough

I would suggest assigning a /64 prefix for Wikidough and choosing the /128 from that.

We could use the very first /64, and then the first IP from that.

2001:67c:930::/64    -  Prefix
2001:67c:930::1/128  -  Anycast IP

Personally I'd probably skip the first /64, to keep the /48 and /64 visually distinct in the routing table. But it doesn't really make a difference (that's just old neteng thinking, and we're being replaced by robots anyway!)

Perhaps:

2001:67c:930:100::/64   - Prefix
2001:67c:930:100::1/128 - Anycast IP
  • 2. Enable IPv6 support in Wikidough's anycast configuration
  • 3. Configure VM to CR BGP Sessions

We need to configure the VMs running the Wikidough service to announce the newly assigned /128 address, and configure the CR routers to form a BGP adjacency with them.

When that is complete we can test that things are working locally at that POP before proceeding.

  • 4. Configure CR aggregate route

We need to add the /48 to the list of aggregate prefixes on each CR. This will generate the /48 in BGP if any contributing routes are present, which in turn will be announced upstream.

Transits will likely need manual engagement before they'll update their filters and allow the route.

  • 5. Update AAAA records for Wikidough

Event Timeline

cmooney changed the task status from Open to In Progress.Feb 7 2022, 6:17 PM
cmooney triaged this task as Medium priority.
cmooney created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
cmooney renamed this task from Allocate range/IP and enable IPv6 on Wikidough hosts to Enable IPv6 for Wikidough.Feb 7 2022, 6:19 PM
cmooney updated the task description. (Show Details)

We have discussed this in the Traffic team and decided to go with 2001:67c:930::1/128, mostly because we feel it's easy to memorize/copy (for cases where people want to do that and directly use the IP in their stub configuration).

There is no strong preference on which particular /128 to go with so we just picked this.

We have discussed this in the Traffic team and decided to go with 2001:67c:930::1/128, mostly because we feel it's easy to memorize/copy (for cases where people want to do that and directly use the IP in their stub configuration).

My OCD will bite me for years but yes that is fine. Allocated in Netbox now:

https://netbox.wikimedia.org/ipam/prefixes/515/ip-addresses/

Not sure if you want to use the same /64 for durum or another one. Maybe we don't need to make a decision on that now.

No rush our side, whenever you've item #2 complete let me know and we can take a look at number #3. I'll find some time to look at prepping #4 in the meantime.

We have discussed this in the Traffic team and decided to go with 2001:67c:930::1/128, mostly because we feel it's easy to memorize/copy (for cases where people want to do that and directly use the IP in their stub configuration).

My OCD will bite me for years but yes that is fine. Allocated in Netbox now:

https://netbox.wikimedia.org/ipam/prefixes/515/ip-addresses/

Not sure if you want to use the same /64 for durum or another one. Maybe we don't need to make a decision on that now.

Thanks! At least for IPv4, we were using the same /24 for durum. If there is a reason to not do that with IPv6 /64, please let me know otherwise we will just use that; for example we will use 2001:67c:930::2/128 for durum.

ssingh: thanks! Yeah I'm not aware of any reason not to just match what was done with the IPv4, even if there are other options in this case.

I've gone and added 3 IPs for Durum to mirror what was in place for the v4 allocation:

https://netbox.wikimedia.org/ipam/prefixes/515/ip-addresses/

Does that make sense? Let me know if there are any issues.

ssingh: thanks! Yeah I'm not aware of any reason not to just match what was done with the IPv4, even if there are other options in this case.

I've gone and added 3 IPs for Durum to mirror what was in place for the v4 allocation:

https://netbox.wikimedia.org/ipam/prefixes/515/ip-addresses/

Does that make sense? Let me know if there are any issues.

Thank you, this is perfect!

Change 761362 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] dnsdist: update AAAA records for check.wikimedia-dns.org

https://gerrit.wikimedia.org/r/761362

Change 761363 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/dns@master] wikimedia-dns.org: add AAAA records for Wikidough

https://gerrit.wikimedia.org/r/761363

Change 761364 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/homer/public@master] Add Wikidough's IPv6 anycast network in esams

https://gerrit.wikimedia.org/r/761364

Change 761372 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] bird: update vips_filter for Wikidough's IPv6 address

https://gerrit.wikimedia.org/r/761372

Change 761373 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: add IPv6 support to Wikidough

https://gerrit.wikimedia.org/r/761373

Change 761362 merged by Ssingh:

[operations/puppet@production] dnsdist: update AAAA records for check.wikimedia-dns.org

https://gerrit.wikimedia.org/r/761362

Change 761363 merged by Ssingh:

[operations/dns@master] wikimedia-dns.org: add AAAA records for Wikidough

https://gerrit.wikimedia.org/r/761363

Change 761372 merged by Ssingh:

[operations/puppet@production] bird: update vips_filter for Wikidough's IPv6 address

https://gerrit.wikimedia.org/r/761372

Change 762521 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] durum: add support for IPv6

https://gerrit.wikimedia.org/r/762521

Change 762788 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/homer/public@master] Add Wikidough's IPv6 anycast network in esams

https://gerrit.wikimedia.org/r/762788

Change 762788 merged by jenkins-bot:

[operations/homer/public@master] Add Wikidough's IPv6 anycast network in esams

https://gerrit.wikimedia.org/r/762788

Mentioned in SAL (#wikimedia-operations) [2022-02-15T11:50:07Z] <sukhe> running homer for Gerrit 762788 and T301165

@ssingh I haven't had time to go through all of this and work it out, but some things seem clear enough.

As per @Majavah's comments on the CR, it does seem our Puppet config for Bird will create a separate systemd unit files / daemons for v4 and v6, and create two configuration files based on bird_anycast.conf.epp.

So the question becomes how do we leverage this to create an IPv6 BGP peering to the CR? Currently the input to the $neighbors data comes from the

https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/hieradata/esams/profile/bird.yaml

So right now, for esams, setting "do_ipv6" to true will cause a template to get built, but no valid IPv6 IPs will be passed into the conf file (unless I am once again mistaken).

@ayounsi specified that in this scenario Bird will try to create the BGP adjacency to the IP of its default gateway on a connected interface? Will that happen?

We could add the esams CR IPv6 loopbacks to the list of IPs for Anycast. What makes me reluctant to do that is our desire to move to a routed access layer (specifically per-rack subnets in drmrs and new eqiad cage). This complicates the idea of using a single loopback peer IP across the whole datacenter. It would be better for wikidough VMs to peer directly with the gateway IP on the directly connected subnet I think. Lots of options so we're not gonna hit a blocker here, just need to decide on the best way forward.

Puppet will automatically filter v4 from v6 neighbors and should do the right thing when v4 and v6 IPs are mixed in neighbors_list
https://github.com/wikimedia/puppet/blob/production/modules/bird/manifests/init.pp#L41

When no neighbors_list is set, it should also "do the right thing" by picking the v4 and v6 default's gateway:
https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/bird/anycast.pp#L26

So in theory the Bird side "should just works", but it has not been thoroughly tested.
What's for sure, is the v6 side on the routers is needed.

However I'm wondering if we should have a closer look at Bird 2 right now, before increasing our Bird 1 specific code.

Change 762807 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Adjust CR Internal Anycast BGP Templates

https://gerrit.wikimedia.org/r/762807

Thanks @ayounsi makes sense.

I guess we have two scenarios to support in the medium term:

  1. VM peered to its own top-of-rack switch, can and should use it's default GW IP for both v4 and v6.
  2. VM peered with core routers directly via intermediate L2 switch. We don't want to peer with the VRRP VIP, so the loopbacks make sense here.

So for this deploy in esams, the simplest thing to do is add the CR IPv6 loopbacks to the $neighbors list for the site, after which the existing Bird config / templates should produce what we need.

For drmrs not having any $neighbors defined for the site should mean it continues to build the config to peer with the ToR switch.

Fhe Eqiad expansion, when we get to the stage of having Anycast hosts in the new cage, we need to make sure they peer with the gateway IPs, even though $neighbors is defined for eqiad. Won't be too tricky, we can deal with when we get to it.

What's for sure, is the v6 side on the routers is needed.

CR above should cover that I think.

Puppet will automatically filter v4 from v6 neighbors and should do the right thing when v4 and v6 IPs are mixed in neighbors_list
https://github.com/wikimedia/puppet/blob/production/modules/bird/manifests/init.pp#L41

When no neighbors_list is set, it should also "do the right thing" by picking the v4 and v6 default's gateway:
https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/bird/anycast.pp#L26

So in theory the Bird side "should just works", but it has not been thoroughly tested.
What's for sure, is the v6 side on the routers is needed.

However I'm wondering if we should have a closer look at Bird 2 right now, before increasing our Bird 1 specific code.

Do we anticipate other for uses of the anycast service? Or did you mean just to upgrade to BIRD 2 in general? I am happy to work on this (probably later) but just wanted to know what you had in mind and if more things are planned.

So in theory the Bird side "should just works", but it has not been thoroughly tested.

That's true, since this is the first use case, we will see how it plays out.

Do we anticipate other for uses of the anycast service?

I don't think we would rule out introducing new services based on it, but right now I don't believe there is anything planned.

I'm wondering if we should have a closer look at Bird 2 right now, before increasing our Bird 1 specific code.

From what I can tell, for the current requirement, no changes to the Bird 1 code is needed. All templates/puppet code should already support v6.

It's probably a good idea to upgade to Bird 2, but we're not in danger of wasting time if we stick with Bird 1.

Change 762807 merged by Cathal Mooney:

[operations/homer/public@master] Adjust CR Internal Anycast BGP Templates

https://gerrit.wikimedia.org/r/762807

Change 763222 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/puppet@production] Add CR router IPv6 loopbacks to Bird config for esams

https://gerrit.wikimedia.org/r/763222

Change 761364 abandoned by Ssingh:

[operations/homer/public@master] Add Wikidough's IPv6 anycast network in esams

Reason:

abandoning; will submit a new CR

https://gerrit.wikimedia.org/r/761364

Change 763222 merged by Cathal Mooney:

[operations/puppet@production] Add CR router IPv6 loopbacks to Bird config for esams

https://gerrit.wikimedia.org/r/763222

Change 763233 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/homer/public@master] Set anycast_neighbors for Wikidough IPv6 in esams

https://gerrit.wikimedia.org/r/763233

Change 763233 merged by Cathal Mooney:

[operations/homer/public@master] Set anycast_neighbors for Wikidough IPv6 in esams

https://gerrit.wikimedia.org/r/763233

Change 761373 merged by Ssingh:

[operations/puppet@production] hiera: add IPv6 support to Wikidough

https://gerrit.wikimedia.org/r/761373

Change 763244 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] bird: use IPv6 address as router_id in bird6.conf

https://gerrit.wikimedia.org/r/763244

Change 763270 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Modify vars for esams to announce IPv6 Anycast range

https://gerrit.wikimedia.org/r/763270

Change 763270 merged by Cathal Mooney:

[operations/homer/public@master] Modify vars for esams to announce IPv6 Anycast range

https://gerrit.wikimedia.org/r/763270

Change 763272 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Add IPv6 Anycast range to knams public announcements.

https://gerrit.wikimedia.org/r/763272

Change 763272 merged by Cathal Mooney:

[operations/homer/public@master] Add IPv6 Anycast range to knams public announcements.

https://gerrit.wikimedia.org/r/763272

Change 763244 merged by Ssingh:

[operations/puppet@production] bird: for bird6, set local IP to IPv6 instead of IPv4

https://gerrit.wikimedia.org/r/763244

Change 763319 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/puppet@production] Rename variable used for router id setting in Bird config

https://gerrit.wikimedia.org/r/763319

Change 763319 merged by Cathal Mooney:

[operations/puppet@production] Rename variable used for router id setting in Bird config

https://gerrit.wikimedia.org/r/763319

Up and running from esams :)

cathal@nbgw:~$ dig -b 2001:470:1f09:32c::103 +nsid +https www.toutless.com @wikimedia-dns.org.

; <<>> DiG 9.17.22-1+0~20220124.69+debian11~1.gbpa036ac-Debian <<>> -b 2001:470:1f09:32c::103 +nsid +https www.toutless.com @wikimedia-dns.org.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21450
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; NSID: 64 6f 68 33 30 30 31 ("doh3001")
;; QUESTION SECTION:
;www.toutless.com.		IN	A

;; ANSWER SECTION:
www.toutless.com.	591	IN	A	52.17.66.124

;; Query time: 39 msec
;; SERVER: 2001:67c:930::1#443(wikimedia-dns.org.) (HTTPS)
;; WHEN: Wed Feb 16 19:43:20 GMT 2022
;; MSG SIZE  rcvd: 72
cathal@nbgw:~$ dig -b 2001:470:1f08:32c::2 +nsid +https www.toutless.com @wikimedia-dns.org.

; <<>> DiG 9.17.22-1+0~20220124.69+debian11~1.gbpa036ac-Debian <<>> -b 2001:470:1f08:32c::2 +nsid +https www.toutless.com @wikimedia-dns.org.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49714
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; NSID: 64 6f 68 33 30 30 32 ("doh3002")
;; QUESTION SECTION:
;www.toutless.com.		IN	A

;; ANSWER SECTION:
www.toutless.com.	428	IN	A	52.17.66.124

;; Query time: 27 msec
;; SERVER: 2001:67c:930::1#443(wikimedia-dns.org.) (HTTPS)
;; WHEN: Wed Feb 16 19:43:30 GMT 2022
;; MSG SIZE  rcvd: 72

Change 763331 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/homer/public@master] Add all doh* and durum* hosts to anycast_neighbors to enable IPv6

https://gerrit.wikimedia.org/r/763331

Change 763339 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] Add CR router IPv6 loopbacks to Bird for {eqsin,eqiad,ulsfo,codwf}

https://gerrit.wikimedia.org/r/763339

Change 763339 merged by Ssingh:

[operations/puppet@production] Add CR router IPv6 loopbacks to Bird for {eqsin,eqiad,ulsfo,codwf}

https://gerrit.wikimedia.org/r/763339

Change 763331 merged by jenkins-bot:

[operations/homer/public@master] Add all doh* and durum* hosts to anycast_neighbors to enable IPv6

https://gerrit.wikimedia.org/r/763331

Change 762521 merged by Ssingh:

[operations/puppet@production] durum: add support for IPv6

https://gerrit.wikimedia.org/r/762521

Mentioned in SAL (#wikimedia-operations) [2022-02-16T22:15:28Z] <sukhe@cumin1001> START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on durum[6001-6002].drmrs.wmnet with reason: T301165; errors expected, not serving any traffic

Mentioned in SAL (#wikimedia-operations) [2022-02-16T22:15:33Z] <sukhe@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on durum[6001-6002].drmrs.wmnet with reason: T301165; errors expected, not serving any traffic

Mentioned in SAL (#wikimedia-operations) [2022-02-16T22:15:43Z] <sukhe@cumin1001> START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on doh[6001-6002].wikimedia.org with reason: T301165; errors expected, not serving any traffic

Mentioned in SAL (#wikimedia-operations) [2022-02-16T22:15:49Z] <sukhe@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on doh[6001-6002].wikimedia.org with reason: T301165; errors expected, not serving any traffic

Ok gonna close this one, range announced and doh working on IPv6 from all our POPs now.

I've a separate task - T301900 - to validate the route propagation and that is being accepted by transits.

root@nyc~# dig -b 2605:3a40:3::1fa +nsid +tls A www.toutless.com @wikimedia-dns.org.

; <<>> DiG 9.17.22-1+0~20220124.69+debian10~1.gbpa036ac-Debian <<>> -b 2605:3a40:3::1fa +nsid +tls A www.toutless.com @wikimedia-dns.org.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64656
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; NSID: 64 6f 68 31 30 30 32 ("doh1002")
;; QUESTION SECTION:
;www.toutless.com.		IN	A

;; ANSWER SECTION:
www.toutless.com.	576	IN	A	52.17.66.124

;; Query time: 16 msec
;; SERVER: 2001:67c:930::1#853(wikimedia-dns.org.) (TLS)
;; WHEN: Wed Feb 16 22:33:12 UTC 2022
;; MSG SIZE  rcvd: 72
cmooney updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2022-02-22T12:50:44Z] <sukhe@cumin1001> START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on doh[6001-6002].wikimedia.org with reason: T301165; errors expected, not serving any traffic

Mentioned in SAL (#wikimedia-operations) [2022-02-22T12:50:50Z] <sukhe@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on doh[6001-6002].wikimedia.org with reason: T301165; errors expected, not serving any traffic

Mentioned in SAL (#wikimedia-operations) [2022-02-22T14:53:54Z] <sukhe@cumin1001> START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on durum[6001-6002].drmrs.wmnet with reason: T301165; errors expected, not serving any traffic

Mentioned in SAL (#wikimedia-operations) [2022-02-22T14:54:00Z] <sukhe@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on durum[6001-6002].drmrs.wmnet with reason: T301165; errors expected, not serving any traffic