Envoy should listen on ipv6 and ipv4
Open, MediumPublic
Actions

Assigned To

None

Authored By

	fgiunchedi
	Jun 16 2020, 2:57 PM

Description

According to the Envoy documentation using something like this will listen to both v6 and v4 (https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/core/address.proto)

socket_address: {address: '::', port_value: 443, ipv4_compat: true}

Details

Subject	Repo	Branch	Lines +/-
wikifunctions: Add mesh.configuration in package.json	operations/deployment-charts	master	+66 -31
cxserver: Bump mesh.configuration to 1.7	operations/deployment-charts	master	+134 -84
rec-api: Bump mesh.configuration to 1.7	operations/deployment-charts	master	+122 -76
eventstreams: Bump mesh.configuration to 1.7	operations/deployment-charts	master	+123 -85
termbox: Bump module dependencies	operations/deployment-charts	master	+121 -75
service mesh: Listen unconditionally on IPv6/IPv4	operations/deployment-charts	master	+36 -8
hieradata: cloudweb: enable envoy services_proxy on ipv6	operations/puppet	production	+6 -0
services_proxy: Switch listen_ipv6 to true by default	operations/puppet	production	+3 -10
Don't override the IPv6 stanza in services_proxy/envoy_service_listener.yaml.erb	operations/puppet	production	+4 -2
envoy-build-config: call extend() instead of append() if passed a list	operations/puppet	production	+4 -1
Fix for services_proxy listen	operations/puppet	production	+3 -2
services_proxy: Listen on :: and not ::1	operations/puppet	production	+1 -1
hieradata: enable ipv6 on envoy services on all mw servers	operations/puppet	production	+4 -0
hieradata: enable ipv6 on envoy services mw canaries	operations/puppet	production	+2 -1
hieradata: enable ipv6 on envoy services proxy on 2 servers	operations/puppet	production	+2 -0
profile::templates::services_proxy: switch to ::1 when listen_ipv6 is true	operations/puppet	production	+1 -1
parsoid::testreduce: let envoy listen on IPv6 as well	operations/puppet	production	+1 -0
service_proxy: add ipv6 config option on services_proxy config	operations/puppet	production	+13 -3

Related Objects

Mentioned In: T352747: Google is not listed as an option for Norwegian
T333969: Enable Opus models for languages lacking other Machine Translation options
T355686: Configure mesh listeners to allow IPv6 localhost (::) as well as IPv4 (127.0.0.1)
T276323: Restbase cannot connect to ipv6-only service
Mentioned Here: T333969: Enable Opus models for languages lacking other Machine Translation options
T352747: Google is not listed as an option for Norwegian
T355686: Configure mesh listeners to allow IPv6 localhost (::) as well as IPv4 (127.0.0.1)
T266509: Make testreduce web UI publicly accessible on the internet

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

ran into this issue today when working on T266509. Was wondering for some time why the envoy setup looks fine but things are not working.

until eventually i found in the logs it's trying to connect via v6 (of course testreduce1001 has one as I always try do that by default) but envoy is not listening on it.

Change 629343 merged by Effie Mouzeli:
[operations/puppet@production] service_proxy: add ipv6 config option on services_proxy config

https://gerrit.wikimedia.org/r/629343

Maintenance_bot removed a project: Patch-For-Review.Jan 27 2021, 8:10 AM

I 've left a comment in the merged change, duplicating here for visibility (since the change is merged already)

IPv4 compatible addresses are deprecated (yet still widely in use). See https://tools.ietf.org/html/rfc4291#section-2.5.5. There are also a set of issues with those, see https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02.

BSD systems never adopted ipv4 compat addresses, keeping the 2 stacks entirely separated. OpenBSD alludes to the issues in https://man.openbsd.org/inet6.4

"For security reasons, OpenBSD does not route IPv4 traffic to an AF_INET6 socket, and does not support IPv4 mapped addresses, where IPv4 traffic is seen as if it comes from an IPv6 address like “::ffff:10.1.1.1”. Where both IPv4 and IPv6 traffic need to be accepted, bind and listen on two sockets."

That being said, to my understanding, having the 2 properly separated in the envoy config will require that we duplicate most of the generated config. In the puppet level, perhaps, we can get away with a for loop over the 2 protocols.

So, before enabling this throughout the fleet, maybe we can look into solving it using 2 separate stacks instead.

Joe added a project: envoy.Jan 27 2021, 10:42 AM

Change 659051 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] parsoid::testreduce: let envoy listen on IPv6 as well

https://gerrit.wikimedia.org/r/659051

gerritbot added a project: Patch-For-Review.Jan 27 2021, 6:22 PM

Change 659051 merged by Dzahn:
[operations/puppet@production] parsoid::testreduce: let envoy listen on IPv6 as well

https://gerrit.wikimedia.org/r/659051

Maintenance_bot removed a project: Patch-For-Review.Jan 27 2021, 7:10 PM

In T255568#6779477, @akosiaris wrote:

I 've left a comment in the merged change, duplicating here for visibility (since the change is merged already)

IPv4 compatible addresses are deprecated (yet still widely in use). See https://tools.ietf.org/html/rfc4291#section-2.5.5. There are also a set of issues with those, see https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02.

<snip>

So, before enabling this throughout the fleet, maybe we can look into solving it using 2 separate stacks instead.

Reading https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02, to my understanding, there are threats when using IPv4 compatible addresses on the wire. The problem I am trying to solve regards to local traffic towards the services_proxy, when applications are trying to resolve "localhost". My suggestion is

Keep the admin listener on 0.0.0.0 (IPv4 only)
Keep tls_terminator on 0.0.0.0 (IPv4 only)
Set all services_proxy listeners to listen to ::1, with ipv4_compat: true

Hi serviceops - I've run into some of the effects of this recently and tracked down this ticket, which seems a relevant/recent reference point.

The current puppet repo has this envoy ipv4_compat mechanism available optionally via a paramter listen_ipv6 in the services_proxy listener and the tls_terminator configurations, which is used in a few cases already:

hieradata/role/common/idp.yaml:profile::tlsproxy::envoy::listen_ipv6: true
hieradata/role/common/idp_test.yaml:profile::tlsproxy::envoy::listen_ipv6: true
hieradata/role/common/parsoid/testreduce.yaml:profile::tlsproxy::envoy::listen_ipv6: true
hieradata/role/common/restbase/dev_cluster.yaml:profile::services_proxy::envoy::listen_ipv6: true

Either way, I think this is probably causing some harm. We're seeing logstash entries with these v4-mapped IPs recorded in XFF headers and X-Client-IP, as in this logstash:

https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-restbase-2021.02.24?id=H0tT1XcBsCn0xdb8Djvf

These invented v4mapped IPs are not legitimate in these and many other contexts, and are bound to cause some subtle (or not-so-subtle) issues. All over various parts of our services and infra, we have network ACLs and ratelimiter configs and such that won't recognize these IPs as being parts of our internal spaces. Basically, ipv4_compat (or in more generic terms, any service daemon listening on the IPv6 ANY-address without IPV6_V6ONLY) seems like a problematic idea, at least here. The alternative answer would be to go around trying to ensure everything else in our stack, our puppetization, our analytics, etc all understands how to un-translate v4mapped, but that seems like an unnecessarily Sisyphean task compared to just reconfiguring listeners to not make up non-existent source addresses.

As best I can tell, it's also not possible to configure a dual-addressed listener in envoy (open issue: https://github.com/envoyproxy/envoy/issues/11184 ), which means you'd have to duplicate the entire listener config just to give it separate ipv4 and ipv6 :/

Vgutierrez subscribed.Feb 24 2021, 8:57 PM

Just for the record, the restbase cluster that has ipv6_compat activated is the dev cluster. Nothing serving production traffic.

@Joe yeah I'm not sure which layer is causing the logstash appearance there. It's from restbase1019 as a client towards something, maybe parsoid?

No, that entry is for testreduce, so another test instance too. So I doubt that what you're seeing in the logs has anything to do with this setting.

In fact, the log you reported above is generated by restbase itself, and seems not to have anything to do with envoy proxying, nor with restbase calling another service. That's restbase throttling a specific client that was making too many requests before making a backend request.

I removed that setting from the testreduce1001 envoy, just to make sure.

-        address: '::'
-        ipv4_compat: true
+        address: 0.0.0.0

https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-restbase-2020.12.24?id=L_3hlnYBjr5R1RLC5PlW

points out that this is before the envoy change (it's from Dec 24 2020). In this case UA is mobileapps/WMF but just looking at restbase sockets on a box I see

nodejs    127896    restbase   13u  IPv6  109896146      0t0  TCP *:7233 (LISTEN)
nodejs    127896    restbase   14u  IPv6  109896147      0t0  TCP *:7231 (LISTEN)

so restbase is listening on the IPv6 socket only and relying on the default of net.ipv6.bindv6only being 0 to also serve IPv4 requests over it (with the obvious issues mentioned above). I am not sure what that requests is btw. x-client-ip points to the restbase host itself?

I don't think this is related to this task as we are tracking envoy specifically here, but it's a clear indication of how wrong ipv4 mapped ipv6 address can be.

As best I can tell, it's also not possible to configure a dual-addressed listener in envoy (open issue: https://github.com/envoyproxy/envoy/issues/11184 ), which means you'd have to duplicate the entire listener config just to give it separate ipv4 and ipv6 :/

Yes, that's an alternative.

In T255568#6781621, @jijiki wrote:

In T255568#6779477, @akosiaris wrote:

I 've left a comment in the merged change, duplicating here for visibility (since the change is merged already)

IPv4 compatible addresses are deprecated (yet still widely in use). See https://tools.ietf.org/html/rfc4291#section-2.5.5. There are also a set of issues with those, see https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02.

<snip>

So, before enabling this throughout the fleet, maybe we can look into solving it using 2 separate stacks instead.

Reading https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02, to my understanding, there are threats when using IPv4 compatible addresses on the wire. The problem I am trying to solve regards to local traffic towards the services_proxy, when applications are trying to resolve "localhost". My suggestion is

It's not just "on the wire", it can and will happen even over localhost. e.g. https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-restbase-2021.02.24?id=lPI_1ncBfVMx58vqDTcc

Keep the admin listener on 0.0.0.0 (IPv4 only)

Keep tls_terminator on 0.0.0.0 (IPv4 only)

Set all services_proxy listeners to listen to ::1, with ipv4_compat: true

Why not ipv4_compat: false, that is going for IPv6 only for that last bulletpoint ? All apps connect to localhost and that's guaranteed to have a working IPv6 setup. It might be cleaner to do that instead.

In T255568#6858884, @akosiaris wrote:

In T255568#6781621, @jijiki wrote:

It's not just "on the wire", it can and will happen even over localhost. e.g. https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-restbase-2021.02.24?id=lPI_1ncBfVMx58vqDTcc

I was referring to the the "threats" part of the document

Keep the admin listener on 0.0.0.0 (IPv4 only)

Keep tls_terminator on 0.0.0.0 (IPv4 only)

Set all services_proxy listeners to listen to ::1, with ipv4_compat: true

Why not ipv4_compat: false, that is going for IPv6 only for that last bulletpoint ? All apps connect to localhost and that's guaranteed to have a working IPv6 setup. It might be cleaner to do that instead.

I can have a go on mwdebug and possibly on one of the canaries and see what happens. If there are no related errors, I see not reason why not to do it. Thanks!

Just a note this was added to the main envoy config on july 2020 and the service proxy on jan 27th. The main justification to add theses to envoy is that having envoy listen on specific sockets requires a lot of duplication in puppet. When i first introduced theses for the idp use case i pointed to the fact that with apache 2.4 --enable-v4-mapped is the default on all platforms except FreeBSD, NetBSD, and OpenBSD in order to convince alex.

@akosiaris

IPv4 compatible addresses are deprecated (yet still widely in use). See https://tools.ietf.org/html/rfc4291#section-2.5.5. There are also a set of issues with those, see https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02.

in relation to the first point my reading of RFC4291 - 2.5 indicates only 2.5.7 unicast-link local addresses are deprecated and not 2.5.5.

There are several types of unicast addresses in IPv6, in particular,
Global Unicast, site-local unicast (deprecated, see Section 2.5.7),
and Link-Local unicast.  There are also some special-purpose subtypes
of Global Unicast, such as IPv6 addresses with embedded IPv4
addresses.

In relation to draft-itojun-v6ops-v4mapped-harmful-02 its worth noting that document is in draft and expired i.e. didn't reach consensus and a quick scan of the archives suggests this issue may have been one of the sticking points

We cannot deprecate usage of IPv4 Mapped Addresses for the API (not
speaking of the wire) they are deployed and part of RFC 3493 and IEEE
standard, and the API is widely deployed on all product platforms that
require APIs

https://vaninst.ca/lists/v6ops/v6ops.2003/msg01097.html (sorry of the bad link i couldn't find the ipv6-ops archives on https://mailarchive.ietf.org/arch/advsearch/)

jbond triaged this task as Medium priority.Feb 25 2021, 10:00 AM

I ran a quick (and likley error prone) script to see which other daemons listen with mapped addresse

$ sudo cumin -o json 'A:all' "ss -ltp6 | awk '$ 4 ~ /^\*/ {print $ NF}'"  > ss.json

import json
from pathlib import Path

ss = json.loads(Path('ss.json').read_text())
daemons = set()
for host, daemon in ss.items():
    try:
        daemons.add(daemon.split('"')[1])
    except:
        pass
print('\n'.join(sorted(daemons)))

Burrow
alertmanager-ir
apache2
cadvisor
ceph-mgr
envoy
gobgpd
java
kube-apiserver
mtail
mysqld
nfacctd
nginx
nodejs
poolcounter-pro
prometheus-apac
prometheus-blac
prometheus-elas
prometheus-hapr
prometheus-ipse
prometheus-logs
prometheus-mcro
prometheus-memc
prometheus-mysq
prometheus-node
prometheus-post
prometheus-squi
prometheus-stat
prometheus-varn
statsite
systemd
thanos
uwsgi_python37

In T255568#6860123, @jbond wrote:

Just a note this was added to the main envoy config on july 2020 and the service proxy on jan 27th. The main justification to add theses to envoy is that having envoy listen on specific sockets requires a lot of duplication in puppet. When i first introduced theses for the idp use case i pointed to the fact that with apache 2.4 --enable-v4-mapped is the default on all platforms except FreeBSD, NetBSD, and OpenBSD in order to convince alex.

Indeed. I had raised the same concerns (logs, figuring out which ports are daemon listening on) back then from what I see. At least I am consistent, that's a relief.

@akosiaris

IPv4 compatible addresses are deprecated (yet still widely in use). See https://tools.ietf.org/html/rfc4291#section-2.5.5. There are also a set of issues with those, see https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02.

in relation to the first point my reading of RFC4291 - 2.5 indicates only 2.5.7 unicast-link local addresses are deprecated and not 2.5.5.
There are several types of unicast addresses in IPv6, in particular,
Global Unicast, site-local unicast (deprecated, see Section 2.5.7),
and Link-Local unicast.  There are also some special-purpose subtypes
of Global Unicast, such as IPv6 addresses with embedded IPv4
addresses.

2.5.5.1 is what I was referring to. It's unfortunately confusingly named too. Seems like I managed to misread it at least.

2.5.5.1.  IPv4-Compatible IPv6 Address

   The "IPv4-Compatible IPv6 address" was defined to assist in the IPv6
   transition.  The format of the "IPv4-Compatible IPv6 address" is as
   follows:

   |                80 bits               | 16 |      32 bits        |
   +--------------------------------------+--------------------------+
   |0000..............................0000|0000|    IPv4 address     |
   +--------------------------------------+----+---------------------+

   Note: The IPv4 address used in the "IPv4-Compatible IPv6 address"
   must be a globally-unique IPv4 unicast address.

   The "IPv4-Compatible IPv6 address" is now deprecated because the
   current IPv6 transition mechanisms no longer use these addresses.
   New or updated implementations are not required to support this
   address type.

2.5.5.2. IPv4-Mapped IPv6 Address is not deprecated however.

In relation to draft-itojun-v6ops-v4mapped-harmful-02 its worth noting that document is in draft and expired i.e. didn't reach consensus and a quick scan of the archives suggests this issue may have been one of the sticking points
We cannot deprecate usage of IPv4 Mapped Addresses for the API (not
speaking of the wire) they are deployed and part of RFC 3493 and IEEE
standard, and the API is widely deployed on all product platforms that
require APIs

Sure, but the sheer existence of the draft alone as well as the stated reasoning for not deprecating IPv4 Mapped addresses can be summarized as expect problems if you use it, but we can't tell everyone to not use it. We are witnessing some already (e.g. log parsing as Brandon mentioned)

https://vaninst.ca/lists/v6ops/v6ops.2003/msg01097.html (sorry of the bad link i couldn't find the ipv6-ops archives on https://mailarchive.ietf.org/arch/advsearch/)

Many thanks for spending time to dig that up. TIL.

2.5.5.1 is what I was referring to. It's unfortunately confusingly named too. Seems like I managed to misread it at least.

Ah thanks, i had only skimmed myself :)

Sure, but the sheer existence of the draft alone as well as the stated reasoning for not deprecating IPv4 Mapped addresses can be summarized as expect problems if you use it, but we can't tell everyone to not use it. We are witnessing some already (e.g. log parsing as Brandon mentioned)

completely agree my reply was not intended to imply a preference

e.g. log parsing as Brandon mentioned)

In relation to logging specifically it would seem that at least some of the other daemons from the list above (definitely Apache) DTRT and log an ipv4 address. Somewhat dogging the main issue but the logging of ::FFFF:<IPv4-address> instead the IPv4-address is arguably a bug (although i couldn't find any specific advice in rfc 3493 related to logging and i'm not that familiar with that specific ietf track).

Also specific to logging issues it may be possible to fix this up in the logging pipeline?

In T255568#6860521, @jbond wrote:

2.5.5.1 is what I was referring to. It's unfortunately confusingly named too. Seems like I managed to misread it at least.

Ah thanks, i had only skimmed myself :)

Sure, but the sheer existence of the draft alone as well as the stated reasoning for not deprecating IPv4 Mapped addresses can be summarized as expect problems if you use it, but we can't tell everyone to not use it. We are witnessing some already (e.g. log parsing as Brandon mentioned)

completely agree my reply was not intended to imply a preference

e.g. log parsing as Brandon mentioned)

In relation to logging specifically it would seem that at least some of the other daemons from the list above (definitely Apache) DTRT and log an ipv4 address. Somewhat dogging the main issue but the logging of ::FFFF:<IPv4-address> instead the IPv4-address is arguably a bug (although i couldn't find any specific advice in rfc 3493 related to logging and i'm not that familiar with that specific ietf track).

True, I think it's a bug as well. But one that needs to be solve multiple times across multiple languages/frameworks (e.g. https://stackoverflow.com/questions/29411551/express-js-req-ip-is-returning-ffff127-0-0-1). We can adapt service-runner for sure, not sure about other applications.

Also specific to logging issues it may be possible to fix this up in the logging pipeline?

Could be, I have no idea tbh.

True, I think it's a bug as well. But one that needs to be solve multiple times across multiple languages/frameworks

agree and doesn't help with other issues like the XFF header

Change 667713 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] profile::templates::services_proxy: switch to ::1 when listen_ipv6 is true

https://gerrit.wikimedia.org/r/667713

gerritbot added a project: Patch-For-Review.Mar 1 2021, 9:40 PM

Change 667714 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] hieradata: enable ipv6 on envoy services proxy on mwdebug1001

https://gerrit.wikimedia.org/r/667714

Change 667713 merged by Effie Mouzeli:
[operations/puppet@production] profile::templates::services_proxy: switch to ::1 when listen_ipv6 is true

https://gerrit.wikimedia.org/r/667713

Change 667714 merged by Effie Mouzeli:
[operations/puppet@production] hieradata: enable ipv6 on envoy services proxy on 2 servers

https://gerrit.wikimedia.org/r/667714

hnowlan mentioned this in T276323: Restbase cannot connect to ipv6-only service.Mar 3 2021, 1:06 PM

I think we can call it success and roll it out to app and api next week (as we have more visibility there), and then on jobrunners + parsoid.

Change 669878 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] hieradata: enable ipv6 on envoy services mw canaries

https://gerrit.wikimedia.org/r/669878

Change 669878 merged by Effie Mouzeli:
[operations/puppet@production] hieradata: enable ipv6 on envoy services mw canaries

https://gerrit.wikimedia.org/r/669878

Change 673061 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] hieradata: enable ipv6 on envoy services on all mw servers

https://gerrit.wikimedia.org/r/673061

Change 673061 merged by Effie Mouzeli:
[operations/puppet@production] hieradata: enable ipv6 on envoy services on all mw servers

https://gerrit.wikimedia.org/r/673061

All service proxies in mediawiki hosts (app, api, jobrunner, parsoid) are listening to ::1 :)

Change 815959 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/puppet@production] services_proxy: Listen on :: and not ::1

https://gerrit.wikimedia.org/r/815959

I noticed recently that for example envoy on otrs1001 does not listen on IPv6 yet (when prometheus monitoring failed because it tries IPv6 by default).

But I am not sure if that should be part of this ticket or another. Technically this wasn't restricted to just mw appservers I guess.

I am probably gonna regret reopening this, but here goes anyway.

Trying to recap the current situation (before my latest set of patches got uploaded):

We currently use envoy in 4 different situations:

TLS termination across a variety of standalone services. Configured via profile::tlsproxy::envoy and included in 41 different roles.
TLS termination at the cache layer. Configured via profile::cache::envoy. By necessity and the different specialized requirements, it differs from the above one by shipping various operational overrides and different configuration. Included in just 2 roles, role::cache::text_envoy and role::cache::upload_envoy.
TLS demarcation (not just termination, but initiation too) for the services proxy infrastructure. Essentially services using an envoy mesh to talk to each other. This last one is split in 2 parts
1. Services proxy managed via puppet. Configured via profile::services_proxy::envoy. Included in 4 different roles (+1 dev role).
2. Services proxy for services residing in kubernetes. To avoid duplication, configuration for those is a mix of helm templates and puppet code (helm templates including fragments generated by puppet and populated on the deployment servers).

With that high level recap out of the way, it's important to point out that all of the above profiles reuse 1 main profile, profile::envoy which uses the envoyproxy module. It's a pretty good example of reusing puppet code to achieve slightly different objectives using the same versatile base software.

Couple of interesting points:

profile::tlsproxy::envoy relies on the tls_terminator/listener.yaml.erb puppet template which sets ipv4_compat: true. Enabling IPv6 is tunable via a hiera variable, namely profile::tlsproxy::envoy::listen_ipv6. This is turned on only on the following services: idp, phabricator.
profile::cache::envoy listens on IPv6 already via a hardcoded true in puppet configuration. However, it relies on the same tls_terminator/listener.yaml.erb puppet template which sets ipv4_compat: true, thus re-using that approach. That profile isn't used by any node in the infrastructure (haproxy won) and can be presumably be removed from the puppet repo. I 'll ping the corresponding people to make sure and submit patches.
profile::services_proxy::envoy has a different template, services_proxy/envoy_service_listener.yaml.erb to control listeners. It also uses a different hiera key, namely profile::services_proxy::envoy::listen_ipv6. Again, this profile is used in only 4+1 dev roles, deployment_server, mediawiki::common, ores, restbase::production and restbase::dev_cluster. Of those, only roles including mediawiki::common, that is appserver, api, canary_api, canary_appserver, jobrunner and parsoid set this to true. It's important to also point out that this profile also only listens on ::1 and does not use ipv4_compat: true, thus defaulting to false. Listening on ::1 only was a hack to scratch a specific itch we had on the mw clusters. It was never great, as we ended up NOT listening on IPv4 and thus diverged the IPv4 from the IPv6 behavior, but allowed us to address the issue of seeing TCP connection errors until we could come up with a better solution.

Now, as to what my latest patch series, that is:

does:

It targets JUST profile::services_proxy::envoy. Nothing else of the items listed above is touched. Thus, the only nodes that will see any kind of diff in our environments are mw nodes.
It uses a puppet template trick, namely including a template in another template to avoid duplication of configuration at the puppet level. In this way, at least on the puppet level, duplicate content is barely 2 lines of code. On the generated side of configuration it is an almost full copy, but I 've seen no way of avoiding this.
It fixes that ugly hack of listening only on ::1 and aligns the IPv6 behavior to the IPv4 by listening properly on both address families and on all interfaces.

The approach might also be applicable to profile::tlsproxy::envoy but that will take a bit of investigation first.

In T255568#8094931, @Dzahn wrote:

I noticed recently that for example envoy on otrs1001 does not listen on IPv6 yet (when prometheus monitoring failed because it tries IPv6 by default).

But I am not sure if that should be part of this ticket or another. Technically this wasn't restricted to just mw appservers I guess.

otrs1001 uses profile::tlsproxy::envoy, that is class #1 from the ones listed above. Listening on IPv6 for vrts is rather simple, just set profile::tlsproxy::envoy::listen_ipv6: true in the respective role hiera. This has long been enabled on idp and phabricator without major issues. It does sound like a good match for VRTS from my PoV, which granted is a bit more distant these days.

Thanks for the write up alex, and patches look good to me

It uses a puppet template trick, namely including a template in another template to avoid duplication of configuration at the puppet level. In this way, at least on the puppet level, duplicate content is barely 2 lines of code. On the generated side of configuration it is an almost full copy, but I 've seen no way of avoiding this.

FTR i couldn't think of better a way either, possibly using puppetlabs-concat but i dont see that as enough of a win (if any) to change.

This has long been enabled on idp and phabricator without major issues.

Just to confirm we have see no issues for with this

Thanks John!

I 'll avoid merging those on a Friday and do so on Monday

In T255568#8097145, @akosiaris wrote:

otrs1001 uses profile::tlsproxy::envoy, that is class #1 from the ones listed above. Listening on IPv6 for vrts is rather simple, just set profile::tlsproxy::envoy::listen_ipv6: true in the respective role hiera. This has long been enabled on idp and phabricator without major issues. It does sound like a good match for VRTS from my PoV, which granted is a bit more distant these days.

Is there any reason not to do this globally?

In T255568#8097580, @taavi wrote:

In T255568#8097145, @akosiaris wrote:

otrs1001 uses profile::tlsproxy::envoy, that is class #1 from the ones listed above. Listening on IPv6 for vrts is rather simple, just set profile::tlsproxy::envoy::listen_ipv6: true in the respective role hiera. This has long been enabled on idp and phabricator without major issues. It does sound like a good match for VRTS from my PoV, which granted is a bit more distant these days.

Is there any reason not to do this globally?

Yes. As pointed out above, it is not without repercussions, at least on logging and debugging levels due to ipv4_compat. Depending on the service, there might also be some security considerations. As such, it is best left to be a conscious choice by the team that manages that service, than a surprise coming from a global default.

Furthermore, if the patches for services_proxy pan out, the same approach can probably be used for the tlsproxy and avoid the ipv4_compat pitfall overall.

Dzahn awarded a token.Jul 22 2022, 5:50 PM

jijiki moved this task from 🔦Unused2 to 🙈🙉🙊Backlog on the serviceops board.Oct 17 2022, 4:28 PM

Aklapper edited projects, added Patch-Needs-Improvement; removed Patch-For-Review.Sep 17 2023, 3:58 PM

fgiunchedi removed a project: User-fgiunchedi.Oct 31 2023, 8:11 AM

@jijiki: Per emails from Sep18 and Oct20 and https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup , I am resetting the assignee of this task because there has not been progress lately (please correct me if I am wrong!). Resetting the assignee avoids the impression that somebody is already working on this task. It also allows others to potentially work towards fixing this task. Please claim this task again when you plan to work on it (via Add Action... → Assign / Claim in the dropdown menu) - it would be welcome. Thanks for your understanding!

per comments on https://gerrit.wikimedia.org/r/c/operations/puppet/+/983893/ "grepping thru the Puppet repo shows 11 instances of profile::tlsproxy::envoy::listen_ipv6: true already in use:"

Does this mean this ticket is actually resolved but was not updated?

In T255568#9413725, @Dzahn wrote:

per comments on https://gerrit.wikimedia.org/r/c/operations/puppet/+/983893/ "grepping thru the Puppet repo shows 11 instances of profile::tlsproxy::envoy::listen_ipv6: true already in use:"

Does this mean this ticket is actually resolved but was not updated?

I think that if we want to call it "nicely" resolved, we need a patch to alter the default as @taavi suggested in T255568#8097580. 1,5 years later, adoption has increased enough (as witnessed by the 11 instances in the puppet repo) to justify at least trying that out.

Otherwise, it's old enough and the statement at the beginning is vague enough and misrepresenting the rest of the task that resolving it won't hurt.

bking subscribed.Dec 18 2023, 6:32 PM

Change 815959 merged by Alexandros Kosiaris:

[operations/puppet@production] services_proxy: Listen on :: and not ::1

https://gerrit.wikimedia.org/r/815959

Change 984104 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/puppet@production] Fix for services_proxy listen

https://gerrit.wikimedia.org/r/984104

gerritbot added a project: Patch-For-Review.Dec 19 2023, 9:22 AM

Restricted Application removed a project: Patch-Needs-Improvement. · View Herald TranscriptDec 19 2023, 9:22 AM

Change 984105 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/puppet@production] services_proxy: Switch listen_ipv6 to true by default

https://gerrit.wikimedia.org/r/984105

Change 984104 merged by Alexandros Kosiaris:

[operations/puppet@production] Fix for services_proxy listen

https://gerrit.wikimedia.org/r/984104

Change 984133 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/puppet@production] envoy-build-config: call extend() instead of append() if passed a list

https://gerrit.wikimedia.org/r/984133

Change 984134 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/puppet@production] Don't override the IPv6 stanza in services_proxy/envoy_service_listener.yaml.erb

https://gerrit.wikimedia.org/r/984134

Change 984133 merged by Alexandros Kosiaris:

[operations/puppet@production] envoy-build-config: call extend() instead of append() if passed a list

https://gerrit.wikimedia.org/r/984133

Change 984134 merged by Alexandros Kosiaris:

[operations/puppet@production] Don't override the IPv6 stanza in services_proxy/envoy_service_listener.yaml.erb

https://gerrit.wikimedia.org/r/984134

I 've tried to change the default in 1 of the users above, namely the services proxy (I haven't even looked at the tlsproxy that does TLS termination profile)

Unfortunately, the diff remains large enough to not warrant me just flipping the switch. Change is at https://gerrit.wikimedia.org/r/c/operations/puppet/+/984105, diff for posterity's sake lists the following hosts:

cloudweb1004.wikimedia.org
cloudweb2002-dev.wikimedia.org
deploy1002.eqiad.wmnet
mwmaint1002.eqiad.wmnet
scandium.eqiad.wmnet
snapshot1008.eqiad.wmnet
snapshot1010.eqiad.wmnet
snapshot1011.eqiad.wmnet
snapshot1016.eqiad.wmnet

It's probably prudent that we don't surprise service owners and merge my change.

akosiaris mentioned this in T355686: Configure mesh listeners to allow IPv6 localhost (::) as well as IPv4 (127.0.0.1).Jan 26 2024, 3:19 PM

Clement_Goubert merged a task: T355686: Configure mesh listeners to allow IPv6 localhost (::) as well as IPv4 (127.0.0.1).Jan 29 2024, 11:37 AM

Clement_Goubert added subscribers: Jdforrester-WMF, Clement_Goubert, Lucas_Werkmeister_WMDE.

Change 993673 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] hieradata: cloudweb: enable envoy services_proxy on ipv6

https://gerrit.wikimedia.org/r/993673

Jdforrester-WMF awarded a token.Jan 29 2024, 1:51 PM

Change 993673 merged by Majavah:

[operations/puppet@production] hieradata: cloudweb: enable envoy services_proxy on ipv6

https://gerrit.wikimedia.org/r/993673

Change 999867 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/deployment-charts@master] service mesh: Listen unconditionally on IPv6/IPv4

https://gerrit.wikimedia.org/r/999867

Change 999882 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/deployment-charts@master] termbox: Bump module dependencies

https://gerrit.wikimedia.org/r/999882

Change 999867 merged by jenkins-bot:

[operations/deployment-charts@master] service mesh: Listen unconditionally on IPv6/IPv4

https://gerrit.wikimedia.org/r/999867

Change 999882 merged by jenkins-bot:

[operations/deployment-charts@master] termbox: Bump module dependencies

https://gerrit.wikimedia.org/r/999882

Change 1003368 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/deployment-charts@master] eventstreams: Bump mesh.configuration to 1.7

https://gerrit.wikimedia.org/r/1003368

Change 1003369 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/deployment-charts@master] cxserver: Bump mesh.configuration to 1.7

https://gerrit.wikimedia.org/r/1003369

Change 1003376 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/deployment-charts@master] rec-api: Bump mesh.configuration to 1.7

https://gerrit.wikimedia.org/r/1003376

Change 1003377 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/deployment-charts@master] wikifunctions: Add mesh.configuration in package.json

https://gerrit.wikimedia.org/r/1003377

Change 1003368 merged by jenkins-bot:

[operations/deployment-charts@master] eventstreams: Bump mesh.configuration to 1.7

https://gerrit.wikimedia.org/r/1003368

Change 1003376 merged by jenkins-bot:

[operations/deployment-charts@master] rec-api: Bump mesh.configuration to 1.7

https://gerrit.wikimedia.org/r/1003376

Change 1003377 merged by jenkins-bot:

[operations/deployment-charts@master] wikifunctions: Add mesh.configuration in package.json

https://gerrit.wikimedia.org/r/1003377

Change 1003369 merged by jenkins-bot:

[operations/deployment-charts@master] cxserver: Bump mesh.configuration to 1.7

https://gerrit.wikimedia.org/r/1003369

Mentioned in SAL (#wikimedia-operations) [2024-02-20T12:04:12Z] <kart_> cxserver: Update to 2024-02-15-085232-production + Bump mesh.configuration to 1.7 (T333969, T352747, T355686, T255568)

Stashbot mentioned this in T333969: Enable Opus models for languages lacking other Machine Translation options.Feb 20 2024, 12:04 PM

Stashbot mentioned this in T352747: Google is not listed as an option for Norwegian.

Vgutierrez unsubscribed.Feb 20 2024, 12:56 PM

Lucas_Werkmeister_WMDE unsubscribed.Feb 20 2024, 2:02 PM

	F34189591: image.png
	Mar 25 2021, 7:52 AM

	F34136873: image.png
	Mar 4 2021, 4:52 PM

	F34025027: image.png
	Jan 26 2021, 7:27 AM

	F32360772: image.png
	Sep 23 2020, 10:35 AM

Envoy should listen on ipv6 and ipv4Open, MediumPublicActions

Description

Details

Related Objects

Event Timeline

Envoy should listen on ipv6 and ipv4
Open, MediumPublic
Actions