Page MenuHomePhabricator

consider storing information on cloud NAT mappings
Open, MediumPublic

Description

We are moving towards a future in which no Cloud virtual machine private IP address are visible outside of the cloud realm.

It would be interesting if we could store NAT mappings somewhere in an easy-to-query format to ease in discovering originators of potentially offending network connections.

Per T270704: cloud: introduce new edge network architecture for eqiad1 and codfw1dev this new setup should account for the NAT router changing soon, i.e, this should be network architecture independent.

Event Timeline

aborrero triaged this task as Medium priority.Feb 3 2021, 10:45 AM
aborrero created this task.
aborrero moved this task from Inbox to Soon! on the cloud-services-team (Kanban) board.

If it's hard to make instances be dual stack, it should be possible to implement IPv4 to IPv6 prefix translation in the NAT server. If the cloud instance sends a packet destined for the IPv4 text-lb, convert it to an IPv6 packet destined for the IPv6 text-lb, with its source address statelessly mapped. Then statelessly translate the return traffic back to IPv4. I gather this is called SIIT and is implemented by the open source project Jool.

This idea from @tstarling seems like a clever way to provide NAT mappings inside the NAT'ed addresses without needing external bookkeeping. I don't know however if this would still run directly into the various roadblocks that have so far kept us from implementing dual stack for the Neutron layer itself.

To clarify the task's scope here, and the need from a network operations angle: as a service provider, providing effectively unrestricted IPv4 connectivity from our public cloud to the rest of the internet we need, for various reasons, the ability to identify and/or block the source of traffic in e.g. an incoming third-party report or request, and to be able to do so retroactively with timestamps into the past as well. (This is not a new requirement, nor the result of recent changes in cloud networking -- just something we're overdue for).

Tim's idea is an interesting one, but only solves this for the specific case of text-lb (or other intra-WMF traffic), and not the general case. Even if we were to approach this iteratively, and solve first for the intra-wiki traffic, I'm not personally convinced that deploying something like Jool is going to be easier over deploying something like e.g. natlog. YMMV though, and I have no strong feelings in technical direction, as long as there is a path to get us to where we need to be :)