As we only operates 1 network vendor, we currently have 2 types of network ACLs:
- SRX security policies, zone to zone stateful rules on the payment firewall as well as the management routers
- Firewall filters, stateless rules applied to L3 interfaces on all routers (core, payment, management)
- Both types have a distinct syntax
- Security policies are managed either by a homemade tool written by the fundraising-tech team, or manually for the management routers
- Firewall filters are centrally managed (in Homer) but manually written
- Any SRE (and fr-tech) can (and does) write new ACLs, push to the devices are done either by the SREs (after netops review) or by Netops directly
This raises the following problems:
- Multiple tools and processes, higher learning curve and turnaround time on change requests:
- T260655
- T269958#6701823
- Syntax typos
- Not always caught until being pushed to prod (and rejected by the device linter)
- Entities typos
- Wrong IP or wrong prefix length
- Stale rules
- Lack of consistency
- IPs defined in prefix-lists or directly in the rules
- Discrepancies between v4 and v6 rules
The natural next step now that Homer automates most network configuration, is to provide a standardized and automated way of managing ACLs, leveraging Netbox, the source of truth.
Looking at the landscape of existing open-source tools, one stands out.
Capirca is an actively maintained open source tool made by Google.
It consists of different moving parts:
- Definition files
- Policy files
- ACL rules using the previously defined services and networks in a custom format
- Capirca library
- Takes the above files as input and generates ACLs in a format compatible with a given platform (Junos, SRX, iptables, etc)
An example (high level) usage could be:
- Define the services (ports/protocols) once for the whole infrastructure
- Populate the network definition file partially from Netbox
- At least devices IPs, create groups per hosts prefixes (all bast, all cp, etc)
- Potentially network prefixes as well
- Manually define network for the remaining usecases
- Convert existing policies to their Capirca format
- Cleaning them up in the process
- Most likely manually
- In Homer, for each device (or device roles) define which policies to apply
- When running Homer, ACLs will be added to the pushed config and diffs
This would solve all the problems listed above, either directly or thanks to the audit required to convert them.
- Multiple tools and processes
- Consolidated to a single tool, potentially 2 processes if used by frack and required for PCI compliance
- Syntax typos
- Caught by Capirca locally during the execution
- Entities typos
- Less risk as defined only once, and potentially coming from Netbox
- Stale rules
- Will be removed automatically when a host/IP is removed
- Lack of consistency
- A “network” set can consist of v4 and v6 IPs helping with consistency, defined from Netbox would ensure consistency
In addition it would bring the following advantages (not all currently needed):
- Multi-platform support, might make things easier if if move away from Junos
- Shadow check (a policy rule making the following one useless)
- Optimization (eg. 2 contiguous /32s are merged in a /31)
- Specific flow testing (“can IPx reach IPy through that ACL?”)
On the other hand, there are some possible limitations:
- Network prefixes don’t have names in Netbox, so assigning a relevant variable name might be challenging
- Existing fields could be leveraged (roles, sites, status, description, etc)
- Fetching all hosts IPs from Netbox might be too slow for being an option
- Might be better to “pre-compile” the list on Netbox hosts using a plugin
- Policies use a custom syntax, something like YAML would have been better
- Capirca’s code is complex (at least to me, even though I sent PRs years ago)
Other concerns
- It might not be worth the efforts as our ACLs don’t change often
- There are rumors of Capirca’ successor to be open-sourced by Google in the future
The scope of this task is to discuss network ACLs management in general, and evaluate Capirca for that role. Possibly other tools if any.