Investigate Capirca
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	ayounsi
	Feb 4 2021, 10:59 AM

Description

As we only operates 1 network vendor, we currently have 2 types of network ACLs:

SRX security policies, zone to zone stateful rules on the payment firewall as well as the management routers
Firewall filters, stateless rules applied to L3 interfaces on all routers (core, payment, management)

Both types have a distinct syntax
Security policies are managed either by a homemade tool written by the fundraising-tech team, or manually for the management routers
Firewall filters are centrally managed (in Homer) but manually written
Any SRE (and fr-tech) can (and does) write new ACLs, push to the devices are done either by the SREs (after netops review) or by Netops directly

This raises the following problems:

Multiple tools and processes, higher learning curve and turnaround time on change requests:
- T260655
- T269958#6701823
Syntax typos
- Not always caught until being pushed to prod (and rejected by the device linter)
Entities typos
- Wrong IP or wrong prefix length
Stale rules
- T264993
Lack of consistency
- IPs defined in prefix-lists or directly in the rules
- Discrepancies between v4 and v6 rules

The natural next step now that Homer automates most network configuration, is to provide a standardized and automated way of managing ACLs, leveraging Netbox, the source of truth.
Looking at the landscape of existing open-source tools, one stands out.

Capirca is an actively maintained open source tool made by Google.

It consists of different moving parts:

Definition files
- Services: port/protocol, port-ranges/protocol, nestable sets of services
- Network: IP, prefixes, nestable sets of IP/prefixes
Policy files
- ACL rules using the previously defined services and networks in a custom format
Capirca library
- Takes the above files as input and generates ACLs in a format compatible with a given platform (Junos, SRX, iptables, etc)

An example (high level) usage could be:

Define the services (ports/protocols) once for the whole infrastructure
Populate the network definition file partially from Netbox
- At least devices IPs, create groups per hosts prefixes (all bast, all cp, etc)
- Potentially network prefixes as well
Manually define network for the remaining usecases
Convert existing policies to their Capirca format
- Cleaning them up in the process
- Most likely manually
In Homer, for each device (or device roles) define which policies to apply
When running Homer, ACLs will be added to the pushed config and diffs

This would solve all the problems listed above, either directly or thanks to the audit required to convert them.

Multiple tools and processes
- Consolidated to a single tool, potentially 2 processes if used by frack and required for PCI compliance
Syntax typos
- Caught by Capirca locally during the execution
Entities typos
- Less risk as defined only once, and potentially coming from Netbox
Stale rules
- Will be removed automatically when a host/IP is removed
Lack of consistency
- A “network” set can consist of v4 and v6 IPs helping with consistency, defined from Netbox would ensure consistency

In addition it would bring the following advantages (not all currently needed):

Multi-platform support, might make things easier if if move away from Junos
Shadow check (a policy rule making the following one useless)
Optimization (eg. 2 contiguous /32s are merged in a /31)
Specific flow testing (“can IPx reach IPy through that ACL?”)

On the other hand, there are some possible limitations:

Network prefixes don’t have names in Netbox, so assigning a relevant variable name might be challenging
- Existing fields could be leveraged (roles, sites, status, description, etc)
Fetching all hosts IPs from Netbox might be too slow for being an option
- Might be better to “pre-compile” the list on Netbox hosts using a plugin
Policies use a custom syntax, something like YAML would have been better
Capirca’s code is complex (at least to me, even though I sent PRs years ago)

Other concerns

It might not be worth the efforts as our ACLs don’t change often
There are rumors of Capirca’ successor to be open-sourced by Google in the future

The scope of this task is to discuss network ACLs management in general, and evaluate Capirca for that role. Possibly other tools if any.

Details

Subject	Repo	Branch	Lines +/-
Move core routers border-in filter to Capirca	operations/homer/public	master	+194 -350
Move core routers loopback filter to Capirca	operations/homer/public	master	+141 -248
Move sandbox filter to Capirca	operations/homer/public	master	+278 -99
Bump Capirca to 2.0.4	operations/software/homer	master	+1 -1
Capirca: disable shade check	operations/software/homer	master	+2 -2
Capirca POC	operations/homer/public	master	+1 K -2 K
Port cloud-in4 to Capirca	operations/homer/public	master	+216 -326
Homer: get Capirca definitions from Netbox	operations/puppet	production	+2 -0
Add Capirca support to Homer	operations/software/homer	master	+140 -9
Add Capirca definitions exporter	operations/software/netbox-extras	master	+90 -0

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
					Restricted Task
		Resolved		ayounsi	T273865 Investigate Capirca

Event Timeline

ayounsi triaged this task as Medium priority.Feb 4 2021, 10:59 AM

ayounsi created this task.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 4 2021, 10:59 AM

Maintenance_bot added a project: SRE.Feb 4 2021, 11:45 AM

ayounsi added a subscriber: Jgreen.Feb 8 2021, 8:55 AM

Change 663535 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/homer/public@master] Capirca POC

https://gerrit.wikimedia.org/r/663535

gerritbot added a project: Patch-For-Review.Feb 11 2021, 10:25 AM

Change 663536 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/software/homer@master] Capirca POC

https://gerrit.wikimedia.org/r/663536

Limitations identified:

Some ACLs currently have Jinja code in them, which is not possible through Capirca.
The easiest cases have (or can) be mitigated by either:

removing the feature flag
moving the logic to Homer/Capirca. For example the ACLs needed only need eqiad/codfw can be specified with a local override:

capirca:
  - cr   #<- normal core router ACLs
  - cr-cloud   # <- core site addon

But one is currently more tricky:

{% if ping_offload_redirect | d(false)  %}
term offload-ping4 {
    from {
        destination-address {
            {{ ping_offload_vip }};
        }
        protocol icmp;
        icmp-type echo-request;
    }
    then {
        next-ip {{ ping_offload_redirect }}/32;
    }
}
{% endif %}

As it embed site specific IPs.
One possible way to work around it is to keep the term in a dedicated filter (in the current firewall.conf file) and import it in a generic way:

term offload-ping4  {
    filter offload-ping4 ;
}

Side advantage is that it can also be used in filter transport-in4.

2nd limitation is that we use per family prefix-list. Eg. wikimedia4 and wikimedia6 while Capirca is not able to use the relevant one depending on IP version in use.
One workaround (used in this POC), is to not use prefix-list in those cases, but have Capirca generate destination-address based on the definition files.

On the positive side, the current state of the POC doesn't show significant blockers. ACLs can progressively be ported to Capirca (doesn't need to be all or nothing). It also supports (untested) ACLs from homer-private

Next to test is how to generate the network definitions from Netbox, most likely using a plugin, and fetch it from Netbox at run time. https://gerrit.wikimedia.org/r/666876

If deemed viable:

put the Capirca in it's own Homer class and improve error handling
Test more extensively homer-private ACLs (and maybe network definitions)
~~figure out how to package Capirca (pypi version is old)~~
streamline directory structure
transition more ACLs
write doc/train people

Note that most of the above is out of scope for the POC.

Change 666876 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/software/netbox-extras@master] Add Capirca definitions exporter

https://gerrit.wikimedia.org/r/666876

ayounsi mentioned this in T277146: Authoritative ports list.Mar 11 2021, 10:11 AM

Mentioned in SAL (#wikimedia-operations) [2021-03-22T07:51:24Z] <elukey> stop/start mariadb instances on dbstore1004 to reduce buffer pool memory settings - T273865

Change 666876 merged by Ayounsi:
[operations/software/netbox-extras@master] Add Capirca definitions exporter

https://gerrit.wikimedia.org/r/666876

ayounsi mentioned this in rOSNEb5bbc2d1638b: Add Capirca definitions exporter.Mar 25 2021, 4:33 PM

Change 663536 merged by jenkins-bot:

[operations/software/homer@master] Add Capirca support to Homer

https://gerrit.wikimedia.org/r/663536

ayounsi mentioned this in rOSHO6fc6b4a7b641: Add Capirca support to Homer.Apr 5 2021, 7:14 AM

Change 681775 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/puppet@production] Homer: get Capirca definitions from Netbox

https://gerrit.wikimedia.org/r/681775

Change 681775 merged by Ayounsi:

[operations/puppet@production] Homer: get Capirca definitions from Netbox

https://gerrit.wikimedia.org/r/681775

Aklapper added a project: Infrastructure-Foundations.Jun 21 2021, 8:59 PM

Change 701085 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] Port cloud-in4 to Capirca

https://gerrit.wikimedia.org/r/701085

Change 701085 merged by jenkins-bot:

[operations/homer/public@master] Port cloud-in4 to Capirca

https://gerrit.wikimedia.org/r/701085

ayounsi mentioned this in rOHPUde3d438b0473: Port cloud-in4 to Capirca.Jun 28 2021, 11:54 AM

ayounsi moved this task from Backlog to Work in Progress / Tasks to Do on the netbox board.Aug 10 2021, 1:40 PM

ayounsi moved this task from Backlog to In Progress on the Infrastructure-Foundations board.Aug 12 2021, 9:32 AM

joanna_borun changed the task status from Open to In Progress.Sep 21 2021, 4:00 PM

Waiting for Capirca upstream to merge PRs.

Finally merged!

Change 748080 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] Move sandbox filter to Capirca

https://gerrit.wikimedia.org/r/748080

Change 748098 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] Move core routers loopback filter to Capirca

https://gerrit.wikimedia.org/r/748098

Change 748111 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] Move core routers border-in filter to Capirca

https://gerrit.wikimedia.org/r/748111

Change 663535 abandoned by Ayounsi:

[operations/homer/public@master] Capirca POC

Reason:

Moved to Ic1c558b388f31f1c6454299b6f051c67cf030638 and next in chain.

https://gerrit.wikimedia.org/r/663535

Change 749696 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/software/homer@master] Capirca: disable shade check

https://gerrit.wikimedia.org/r/749696

ayounsi added a parent task: Restricted Task.Jan 3 2022, 3:06 PM

Change 749696 merged by jenkins-bot:

[operations/software/homer@master] Capirca: disable shade check

https://gerrit.wikimedia.org/r/749696

Change 751391 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/software/homer@master] Bump Capirca to 2.0.4

https://gerrit.wikimedia.org/r/751391

ayounsi mentioned this in rOSHO59f6a7a8a773: Capirca: disable shade check.Jan 4 2022, 9:54 AM

Change 751391 merged by jenkins-bot:

[operations/software/homer@master] Bump Capirca to 2.0.4

https://gerrit.wikimedia.org/r/751391

ayounsi mentioned this in rOSHO63bc1871a0e8: Bump Capirca to 2.0.4.Jan 17 2022, 7:17 PM

Change 748080 merged by jenkins-bot:

[operations/homer/public@master] Move sandbox filter to Capirca

https://gerrit.wikimedia.org/r/748080

Change 748098 merged by jenkins-bot:

[operations/homer/public@master] Move core routers loopback filter to Capirca

https://gerrit.wikimedia.org/r/748098

Change 748111 merged by jenkins-bot: