Page MenuHomePhabricator

potential NAT overflow
Open, MediumPublic


With the change in the parent task we could potentially introduce NAT overflow that could result in traffic being dropped.

Example value:

aborrero@cloudnet1004:~ $ sudo ip netns exec qrouter-d93771ba-2711-4f88-804a-8df6fd03978a conntrack -L --dst | wc -l
conntrack v1.4.5 (conntrack-tools): 21527 flow entries have been shown.

At very least, as first counter-measurement we should introduce some metrics to be able to check this situation. Some alerts could also be interesting, but such alerts wouldn't be actionable.

Anyway the root thing here is that something in the network architecture is wrong. There are potentially at least 2 absolute solutions to address this:

  • introduce tenant networks, each tenant with its own NAT router. Something we can't do with our current neutron setup.
  • introduce IPv6, and have all cloud -> wiki traffic be natively IPv6 without NAT

We were already aware of this, that's why we were working on T270704: cloud: introduce new edge network architecture for eqiad1 and codfw1dev (