Page MenuHomePhabricator

Upgrade to Bird 2
Closed, ResolvedPublic

Description

We're currently running Bird 1.6 for all the BGP-on-generic-server, see https://wikitech.wikimedia.org/wiki/Anycast

The latest version on the 1.6 branch is 1.6.8 released in 2019-09-10.

Since then upstream released Bird 2 (changelog) and started working on Bird 3 (currently in alpha).

For our current setup, the main benefit to upgrade to Bird 2 is a unified daemon (and config) for v4 and v6, while 1.6 uses distinct daemons.
This would allow a simpler configuration, thus reducing the overall complexity of the setup.

BGP: Promiscuous ASN mode is also a significant one: "Allow to specify just 'internal' or 'external' for remote neighbor instead of specific ASN. In the second case that means BGP peers with any non-local ASNs are accepted.".
This would simplify the configuration further on both the server and switch side (allowing us to get rid of a hack) as the switch could advertise its normal AS instead of forging the current "14907".

Additional advantages:

  • Staying with a well maintained version, easing future upgrades
  • The usual load of bugfixes, perf improvements (not directly needed), and CLI improvements for easier troubleshooting
  • BGP IPv4 NLRI with an IPv6 Next Hop (RFC 5549) - this could be used to reduce the number of BGP sessions to the routers

Bullseye have bird2 at version 2.0.7, which includes "Promiscuous ASN".

Event Timeline

ayounsi created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 805448 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] aptrepo: add repository component for bird2

https://gerrit.wikimedia.org/r/805448

Change 805874 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] bird: upgrade configuration to bird2 (merge IPv4 and IPv6 configurations)

https://gerrit.wikimedia.org/r/805874

Mentioned in SAL (#wikimedia-operations) [2022-06-16T13:45:50Z] <sukhe> upload bird2_2.0.7-4.1wm1 to apt.wm.o (buster) - T310574

Change 805448 merged by Ssingh:

[operations/puppet@production] aptrepo: add repository component for bird2

https://gerrit.wikimedia.org/r/805448

Mentioned in SAL (#wikimedia-operations) [2022-06-23T13:27:54Z] <sukhe> disable puppet on A:durum or A:wikidough or A:centrallog or A:dns-rec: deploying T310574

Change 805874 merged by Ssingh:

[operations/puppet@production] bird: upgrade configuration to bird2 (merge IPv4 and IPv6 configurations)

https://gerrit.wikimedia.org/r/805874

During the upgrade to bird2 today, the bird side of things seems to have caused no issues. The bird2 service started successfully and the configuration file was correct. However, we ran into an issue with the durum1001 host not advertising any prefixes and that led us to observing that anycast-healthchecker package was removed. This seems to have happened because anycast-healthchecker (henceforth called anycast-hc) depends on bird and we removed bird and installed bird2 as part of the upgrade.

To fix this, we should add bird2 as a dependency for the anycast-hc package and I just did a fresh deb package build. To ensure that there are no issues given that this is a significant update, I ran debdiff:

[The following lists of changes regard files as different if they have
different names, permissions or owners.]

Files in first .deb but not in second
-------------------------------------
-rwxr-xr-x  root/root   DEBIAN/postrm
-rwxr-xr-x  root/root   DEBIAN/prerm

Control files: lines which differ (wdiff format)
------------------------------------------------
Depends: python3:any (>= 3.2~), [-bird | bird2,-] {+bird,+} python3-anycast-healthchecker (= [-0.8.2-1wm1)-] {+0.8.2-1)+}
Installed-Size: [-239-] {+237+}
Version: [-0.8.2-1wm1-] {+0.8.2-1+}

The first package is the new anycast-hc package (0.8.2-1wm1) while the second one is the current package (0.8.2-1). It seems that there are no other changes -- there should not be but this just an extra confirmation.

Next steps include importing this package so that it is updated across the bird hosts but that's for next week given the possible severity of this change.

Change 808043 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] bird: upgrade configuration to bird2 (merge IPv4 and IPv6 configurations)

https://gerrit.wikimedia.org/r/808043

Mentioned in SAL (#wikimedia-operations) [2022-06-28T13:44:25Z] <sukhe> upload anycast-healthchecker 0.8.2-1wm1 to apt.wm.o (buster) - T310574

Change 808043 merged by Ssingh:

[operations/puppet@production] bird: upgrade configuration to bird2 (merge IPv4 and IPv6 configurations)

https://gerrit.wikimedia.org/r/808043

Mentioned in SAL (#wikimedia-operations) [2022-06-28T15:13:49Z] <sukhe> upload prometheus-bird-exporter (1.2.2-1wm1) buster-wikimedia - T310574

Change 809205 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] bird: update bird.conf for bird2 changes

https://gerrit.wikimedia.org/r/809205

Change 809205 abandoned by Ssingh:

[operations/puppet@production] bird: update bird.conf for bird2 changes

Reason:

reviewed. will squash in one commit for complete review

https://gerrit.wikimedia.org/r/809205

Change 809227 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] bird: upgrade configuration to bird2 (merge IPv4 and IPv6 configurations)

https://gerrit.wikimedia.org/r/809227

Notes from today's deployment:

  • We were missing an additional bird config, notably, bird2 rejects by default so we need to explicitly set export all for both IPv4 and IPv6. https://gerrit.wikimedia.org/r/c/operations/puppet/+/809205
  • prometheus-bird-exporter also depended on bird and we rebuilt the package to add an optional bird2 dependency.

So in the (hopefully) last and final deployment, we will need to upgrade the following packages: anycast-healthchecker and prometheus-bird-exporter and then run Puppet agent.

Mentioned in SAL (#wikimedia-operations) [2022-06-29T13:02:42Z] <sukhe> sudo cumin -d 'P{R:Class = bird}' 'disable-puppet "PLEASE DO NOT enable Puppet: deploying T310574"'

Change 809227 merged by Ssingh:

[operations/puppet@production] bird: upgrade configuration to bird2 (merge IPv4 and IPv6 configurations)

https://gerrit.wikimedia.org/r/809227

Mentioned in SAL (#wikimedia-operations) [2022-06-29T15:25:47Z] <sukhe> upload anycast-healthchecker 0.8.2-1wm1 to apt.wm.o (bullseye) - T310574

Mentioned in SAL (#wikimedia-operations) [2022-06-29T17:31:14Z] <sukhe> running puppet agent on centrallog2002 to finalize T310574

===== NODE GROUP =====                                                                                                                
(40) authdns[1001,2001].wikimedia.org,centrallog2002.codfw.wmnet,centrallog1001.eqiad.wmnet,dns[1001-1002,2001-2002,3001-3002,4001-4002,5001-5002,6001-6002].wikimedia.org,doh[1001-1002,2001-2002,3001-3002,4001-4002,5001-5002,6001-6002].wikimedia.org,durum[2001-2002].codfw.wmnet,durum[6001-6002].drmrs.wmnet,durum[1001-1002].eqiad.wmnet,durum[5001-5002].eqsin.wmnet,durum[3001-3002].esams.wmnet,durum[4001-4002].ulsfo.wmnet
----- OUTPUT of '/usr/sbin/bird --version' -----                                                                                      
BIRD version 2.0.7

We are now running bird2 on all the existing bird hosts and benefit from all the changes Arzhel listed above.

Many thanks to @ayounsi for all his help and patience with this, @jbond for helping us with the reviews and @MoritzMuehlenhoff for helping with the Debian packaging. I will leave it to Arzhel to mark this as resolved.

ayounsi claimed this task.

Awesome, thanks a lot @ssingh

I slightly cleaned up the doc (added a mention of the bird2 upgrade)
And updated the dashboard at https://grafana.wikimedia.org/d/dxbfeGDZk/anycast to split the monitoring by clusters, add variables, etc.