Page MenuHomePhabricator

Envoy should listen on ipv6 and ipv4
Closed, ResolvedPublic

Description

According to the Envoy documentation using something like this will listen to both v6 and v4 (https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/core/address.proto)

socket_address: {address: '::', port_value: 443, ipv4_compat: true}

Event Timeline

Change 629343 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] service_proxy: enable ipv6 on envoy config

https://gerrit.wikimedia.org/r/629343

This is causing socket errors, which are not impairing performance in anyway.

image.png (564×1 px, 381 KB)

I tested 629343 in mwdebug1001, where during the time that envoy was listening to ::, socket errors disappeared, and re-appeared after I re-enabled puppet.

image.png (922×1 px, 117 KB)

ran into this issue today when working on T266509. Was wondering for some time why the envoy setup looks fine but things are not working.

until eventually i found in the logs it's trying to connect via v6 (of course testreduce1001 has one as I always try do that by default) but envoy is not listening on it.

Change 629343 merged by Effie Mouzeli:
[operations/puppet@production] service_proxy: add ipv6 config option on services_proxy config

https://gerrit.wikimedia.org/r/629343

I 've left a comment in the merged change, duplicating here for visibility (since the change is merged already)

IPv4 compatible addresses are deprecated (yet still widely in use). See https://tools.ietf.org/html/rfc4291#section-2.5.5. There are also a set of issues with those, see https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02.

BSD systems never adopted ipv4 compat addresses, keeping the 2 stacks entirely separated. OpenBSD alludes to the issues in https://man.openbsd.org/inet6.4

"For security reasons, OpenBSD does not route IPv4 traffic to an AF_INET6 socket, and does not support IPv4 mapped addresses, where IPv4 traffic is seen as if it comes from an IPv6 address like “::ffff:10.1.1.1”. Where both IPv4 and IPv6 traffic need to be accepted, bind and listen on two sockets."

That being said, to my understanding, having the 2 properly separated in the envoy config will require that we duplicate most of the generated config. In the puppet level, perhaps, we can get away with a for loop over the 2 protocols.

So, before enabling this throughout the fleet, maybe we can look into solving it using 2 separate stacks instead.

Change 659051 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] parsoid::testreduce: let envoy listen on IPv6 as well

https://gerrit.wikimedia.org/r/659051

Change 659051 merged by Dzahn:
[operations/puppet@production] parsoid::testreduce: let envoy listen on IPv6 as well

https://gerrit.wikimedia.org/r/659051

I 've left a comment in the merged change, duplicating here for visibility (since the change is merged already)

IPv4 compatible addresses are deprecated (yet still widely in use). See https://tools.ietf.org/html/rfc4291#section-2.5.5. There are also a set of issues with those, see https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02.

<snip>

So, before enabling this throughout the fleet, maybe we can look into solving it using 2 separate stacks instead.

Reading https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02, to my understanding, there are threats when using IPv4 compatible addresses on the wire. The problem I am trying to solve regards to local traffic towards the services_proxy, when applications are trying to resolve "localhost". My suggestion is

  • Keep the admin listener on 0.0.0.0 (IPv4 only)
  • Keep tls_terminator on 0.0.0.0 (IPv4 only)
  • Set all services_proxy listeners to listen to ::1, with ipv4_compat: true

Hi serviceops - I've run into some of the effects of this recently and tracked down this ticket, which seems a relevant/recent reference point.

The current puppet repo has this envoy ipv4_compat mechanism available optionally via a paramter listen_ipv6 in the services_proxy listener and the tls_terminator configurations, which is used in a few cases already:

hieradata/role/common/idp.yaml:profile::tlsproxy::envoy::listen_ipv6: true
hieradata/role/common/idp_test.yaml:profile::tlsproxy::envoy::listen_ipv6: true
hieradata/role/common/parsoid/testreduce.yaml:profile::tlsproxy::envoy::listen_ipv6: true
hieradata/role/common/restbase/dev_cluster.yaml:profile::services_proxy::envoy::listen_ipv6: true

Either way, I think this is probably causing some harm. We're seeing logstash entries with these v4-mapped IPs recorded in XFF headers and X-Client-IP, as in this logstash:

https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-restbase-2021.02.24?id=H0tT1XcBsCn0xdb8Djvf

These invented v4mapped IPs are not legitimate in these and many other contexts, and are bound to cause some subtle (or not-so-subtle) issues. All over various parts of our services and infra, we have network ACLs and ratelimiter configs and such that won't recognize these IPs as being parts of our internal spaces. Basically, ipv4_compat (or in more generic terms, any service daemon listening on the IPv6 ANY-address without IPV6_V6ONLY) seems like a problematic idea, at least here. The alternative answer would be to go around trying to ensure everything else in our stack, our puppetization, our analytics, etc all understands how to un-translate v4mapped, but that seems like an unnecessarily Sisyphean task compared to just reconfiguring listeners to not make up non-existent source addresses.

As best I can tell, it's also not possible to configure a dual-addressed listener in envoy (open issue: https://github.com/envoyproxy/envoy/issues/11184 ), which means you'd have to duplicate the entire listener config just to give it separate ipv4 and ipv6 :/

Just for the record, the restbase cluster that has ipv6_compat activated is the dev cluster. Nothing serving production traffic.

@Joe yeah I'm not sure which layer is causing the logstash appearance there. It's from restbase1019 as a client towards something, maybe parsoid?

No, that entry is for testreduce, so another test instance too. So I doubt that what you're seeing in the logs has anything to do with this setting.

In fact, the log you reported above is generated by restbase itself, and seems not to have anything to do with envoy proxying, nor with restbase calling another service. That's restbase throttling a specific client that was making too many requests before making a backend request.

I removed that setting from the testreduce1001 envoy, just to make sure.

-        address: '::'
-        ipv4_compat: true
+        address: 0.0.0.0

https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-restbase-2020.12.24?id=L_3hlnYBjr5R1RLC5PlW

points out that this is before the envoy change (it's from Dec 24 2020). In this case UA is mobileapps/WMF but just looking at restbase sockets on a box I see

nodejs    127896    restbase   13u  IPv6  109896146      0t0  TCP *:7233 (LISTEN)
nodejs    127896    restbase   14u  IPv6  109896147      0t0  TCP *:7231 (LISTEN)

so restbase is listening on the IPv6 socket only and relying on the default of net.ipv6.bindv6only being 0 to also serve IPv4 requests over it (with the obvious issues mentioned above). I am not sure what that requests is btw. x-client-ip points to the restbase host itself?

I don't think this is related to this task as we are tracking envoy specifically here, but it's a clear indication of how wrong ipv4 mapped ipv6 address can be.

As best I can tell, it's also not possible to configure a dual-addressed listener in envoy (open issue: https://github.com/envoyproxy/envoy/issues/11184 ), which means you'd have to duplicate the entire listener config just to give it separate ipv4 and ipv6 :/

Yes, that's an alternative.

I 've left a comment in the merged change, duplicating here for visibility (since the change is merged already)

IPv4 compatible addresses are deprecated (yet still widely in use). See https://tools.ietf.org/html/rfc4291#section-2.5.5. There are also a set of issues with those, see https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02.

<snip>

So, before enabling this throughout the fleet, maybe we can look into solving it using 2 separate stacks instead.

Reading https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02, to my understanding, there are threats when using IPv4 compatible addresses on the wire. The problem I am trying to solve regards to local traffic towards the services_proxy, when applications are trying to resolve "localhost". My suggestion is

It's not just "on the wire", it can and will happen even over localhost. e.g. https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-restbase-2021.02.24?id=lPI_1ncBfVMx58vqDTcc

  • Keep the admin listener on 0.0.0.0 (IPv4 only)
  • Keep tls_terminator on 0.0.0.0 (IPv4 only)
  • Set all services_proxy listeners to listen to ::1, with ipv4_compat: true

Why not ipv4_compat: false, that is going for IPv6 only for that last bulletpoint ? All apps connect to localhost and that's guaranteed to have a working IPv6 setup. It might be cleaner to do that instead.

It's not just "on the wire", it can and will happen even over localhost. e.g. https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-restbase-2021.02.24?id=lPI_1ncBfVMx58vqDTcc

I was referring to the the "threats" part of the document

  • Keep the admin listener on 0.0.0.0 (IPv4 only)
  • Keep tls_terminator on 0.0.0.0 (IPv4 only)
  • Set all services_proxy listeners to listen to ::1, with ipv4_compat: true

Why not ipv4_compat: false, that is going for IPv6 only for that last bulletpoint ? All apps connect to localhost and that's guaranteed to have a working IPv6 setup. It might be cleaner to do that instead.

I can have a go on mwdebug and possibly on one of the canaries and see what happens. If there are no related errors, I see not reason why not to do it. Thanks!

Just a note this was added to the main envoy config on july 2020 and the service proxy on jan 27th. The main justification to add theses to envoy is that having envoy listen on specific sockets requires a lot of duplication in puppet. When i first introduced theses for the idp use case i pointed to the fact that with apache 2.4 --enable-v4-mapped is the default on all platforms except FreeBSD, NetBSD, and OpenBSD in order to convince alex.

@akosiaris

IPv4 compatible addresses are deprecated (yet still widely in use). See https://tools.ietf.org/html/rfc4291#section-2.5.5. There are also a set of issues with those, see https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02.

in relation to the first point my reading of RFC4291 - 2.5 indicates only 2.5.7 unicast-link local addresses are deprecated and not 2.5.5.

There are several types of unicast addresses in IPv6, in particular,
Global Unicast, site-local unicast (deprecated, see Section 2.5.7),
and Link-Local unicast.  There are also some special-purpose subtypes
of Global Unicast, such as IPv6 addresses with embedded IPv4
addresses.

In relation to draft-itojun-v6ops-v4mapped-harmful-02 its worth noting that document is in draft and expired i.e. didn't reach consensus and a quick scan of the archives suggests this issue may have been one of the sticking points

We cannot deprecate usage of IPv4 Mapped Addresses for the API (not
speaking of the wire) they are deployed and part of RFC 3493 and IEEE
standard, and the API is widely deployed on all product platforms that
require APIs

https://vaninst.ca/lists/v6ops/v6ops.2003/msg01097.html (sorry of the bad link i couldn't find the ipv6-ops archives on https://mailarchive.ietf.org/arch/advsearch/)

jbond triaged this task as Medium priority.Feb 25 2021, 10:00 AM

I ran a quick (and likley error prone) script to see which other daemons listen with mapped addresse

$ sudo cumin -o json 'A:all' "ss -ltp6 | awk '$ 4 ~ /^\*/ {print $ NF}'"  > ss.json
import json
from pathlib import Path

ss = json.loads(Path('ss.json').read_text())
daemons = set()
for host, daemon in ss.items():
    try:
        daemons.add(daemon.split('"')[1])
    except:
        pass
print('\n'.join(sorted(daemons)))
Burrow
alertmanager-ir
apache2
cadvisor
ceph-mgr
envoy
gobgpd
java
kube-apiserver
mtail
mysqld
nfacctd
nginx
nodejs
poolcounter-pro
prometheus-apac
prometheus-blac
prometheus-elas
prometheus-hapr
prometheus-ipse
prometheus-logs
prometheus-mcro
prometheus-memc
prometheus-mysq
prometheus-node
prometheus-post
prometheus-squi
prometheus-stat
prometheus-varn
statsite
systemd
thanos
uwsgi_python37

Just a note this was added to the main envoy config on july 2020 and the service proxy on jan 27th. The main justification to add theses to envoy is that having envoy listen on specific sockets requires a lot of duplication in puppet. When i first introduced theses for the idp use case i pointed to the fact that with apache 2.4 --enable-v4-mapped is the default on all platforms except FreeBSD, NetBSD, and OpenBSD in order to convince alex.

Indeed. I had raised the same concerns (logs, figuring out which ports are daemon listening on) back then from what I see. At least I am consistent, that's a relief.

@akosiaris

IPv4 compatible addresses are deprecated (yet still widely in use). See https://tools.ietf.org/html/rfc4291#section-2.5.5. There are also a set of issues with those, see https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02.

in relation to the first point my reading of RFC4291 - 2.5 indicates only 2.5.7 unicast-link local addresses are deprecated and not 2.5.5.

There are several types of unicast addresses in IPv6, in particular,
Global Unicast, site-local unicast (deprecated, see Section 2.5.7),
and Link-Local unicast.  There are also some special-purpose subtypes
of Global Unicast, such as IPv6 addresses with embedded IPv4
addresses.

2.5.5.1 is what I was referring to. It's unfortunately confusingly named too. Seems like I managed to misread it at least.

2.5.5.1.  IPv4-Compatible IPv6 Address

   The "IPv4-Compatible IPv6 address" was defined to assist in the IPv6
   transition.  The format of the "IPv4-Compatible IPv6 address" is as
   follows:

   |                80 bits               | 16 |      32 bits        |
   +--------------------------------------+--------------------------+
   |0000..............................0000|0000|    IPv4 address     |
   +--------------------------------------+----+---------------------+

   Note: The IPv4 address used in the "IPv4-Compatible IPv6 address"
   must be a globally-unique IPv4 unicast address.

   The "IPv4-Compatible IPv6 address" is now deprecated because the
   current IPv6 transition mechanisms no longer use these addresses.
   New or updated implementations are not required to support this
   address type.

2.5.5.2. IPv4-Mapped IPv6 Address is not deprecated however.

In relation to draft-itojun-v6ops-v4mapped-harmful-02 its worth noting that document is in draft and expired i.e. didn't reach consensus and a quick scan of the archives suggests this issue may have been one of the sticking points

We cannot deprecate usage of IPv4 Mapped Addresses for the API (not
speaking of the wire) they are deployed and part of RFC 3493 and IEEE
standard, and the API is widely deployed on all product platforms that
require APIs

Sure, but the sheer existence of the draft alone as well as the stated reasoning for not deprecating IPv4 Mapped addresses can be summarized as expect problems if you use it, but we can't tell everyone to not use it. We are witnessing some already (e.g. log parsing as Brandon mentioned)

https://vaninst.ca/lists/v6ops/v6ops.2003/msg01097.html (sorry of the bad link i couldn't find the ipv6-ops archives on https://mailarchive.ietf.org/arch/advsearch/)

Many thanks for spending time to dig that up. TIL.

2.5.5.1 is what I was referring to. It's unfortunately confusingly named too. Seems like I managed to misread it at least.

Ah thanks, i had only skimmed myself :)

Sure, but the sheer existence of the draft alone as well as the stated reasoning for not deprecating IPv4 Mapped addresses can be summarized as expect problems if you use it, but we can't tell everyone to not use it. We are witnessing some already (e.g. log parsing as Brandon mentioned)

completely agree my reply was not intended to imply a preference

e.g. log parsing as Brandon mentioned)

In relation to logging specifically it would seem that at least some of the other daemons from the list above (definitely Apache) DTRT and log an ipv4 address. Somewhat dogging the main issue but the logging of ::FFFF:<IPv4-address> instead the IPv4-address is arguably a bug (although i couldn't find any specific advice in rfc 3493 related to logging and i'm not that familiar with that specific ietf track).

Also specific to logging issues it may be possible to fix this up in the logging pipeline?

2.5.5.1 is what I was referring to. It's unfortunately confusingly named too. Seems like I managed to misread it at least.

Ah thanks, i had only skimmed myself :)

Sure, but the sheer existence of the draft alone as well as the stated reasoning for not deprecating IPv4 Mapped addresses can be summarized as expect problems if you use it, but we can't tell everyone to not use it. We are witnessing some already (e.g. log parsing as Brandon mentioned)

completely agree my reply was not intended to imply a preference

e.g. log parsing as Brandon mentioned)

In relation to logging specifically it would seem that at least some of the other daemons from the list above (definitely Apache) DTRT and log an ipv4 address. Somewhat dogging the main issue but the logging of ::FFFF:<IPv4-address> instead the IPv4-address is arguably a bug (although i couldn't find any specific advice in rfc 3493 related to logging and i'm not that familiar with that specific ietf track).

True, I think it's a bug as well. But one that needs to be solve multiple times across multiple languages/frameworks (e.g. https://stackoverflow.com/questions/29411551/express-js-req-ip-is-returning-ffff127-0-0-1). We can adapt service-runner for sure, not sure about other applications.

Also specific to logging issues it may be possible to fix this up in the logging pipeline?

Could be, I have no idea tbh.

True, I think it's a bug as well. But one that needs to be solve multiple times across multiple languages/frameworks

agree and doesn't help with other issues like the XFF header

Change 667713 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] profile::templates::services_proxy: switch to ::1 when listen_ipv6 is true

https://gerrit.wikimedia.org/r/667713

Change 667714 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] hieradata: enable ipv6 on envoy services proxy on mwdebug1001

https://gerrit.wikimedia.org/r/667714

Change 667713 merged by Effie Mouzeli:
[operations/puppet@production] profile::templates::services_proxy: switch to ::1 when listen_ipv6 is true

https://gerrit.wikimedia.org/r/667713

Change 667714 merged by Effie Mouzeli:
[operations/puppet@production] hieradata: enable ipv6 on envoy services proxy on 2 servers

https://gerrit.wikimedia.org/r/667714

image.png (1×3 px, 295 KB)

I think we can call it success and roll it out to app and api next week (as we have more visibility there), and then on jobrunners + parsoid.

Change 669878 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] hieradata: enable ipv6 on envoy services mw canaries

https://gerrit.wikimedia.org/r/669878

Change 669878 merged by Effie Mouzeli:
[operations/puppet@production] hieradata: enable ipv6 on envoy services mw canaries

https://gerrit.wikimedia.org/r/669878

Change 673061 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] hieradata: enable ipv6 on envoy services on all mw servers

https://gerrit.wikimedia.org/r/673061

Change 673061 merged by Effie Mouzeli:
[operations/puppet@production] hieradata: enable ipv6 on envoy services on all mw servers

https://gerrit.wikimedia.org/r/673061

jijiki claimed this task.

All service proxies in mediawiki hosts (app, api, jobrunner, parsoid) are listening to ::1 :)

image.png (1×3 px, 1 MB)