Event Timeline
This is comparing the existing envoy config between a main DC prometheus (2005) with a POP prometheus (3003).
Obviously the change in path to chain and key is expected, but the rest is not.
One has "non-SNI support" and the other does not.
I think that is somehow the root of the issue with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1023917
fwiw this look quite similar to the diff from the experimental patch where the global cert name was changed to a discovery.wmnet domain:
https://puppet-compiler.wmflabs.org/output/1019066/2133/prometheus6002.drmrs.wmnet/index.html
Resources modified File[/etc/envoy/listeners.d/00-tls_terminator_443.yaml] Content differences: --- /etc/envoy/listeners.d/00-tls_terminator_443.yaml.orig +++ /etc/envoy/listeners.d/00-tls_terminator_443.yaml @@ -8,28 +8,25 @@ "@type": type.googleapis.com/envoy.extensions.filters.listener.tls_inspector.v3.TlsInspector tcp_fast_open_queue_length: 150 filter_chains: -- filter_chain_match: - server_names: ["prometheus", "prometheus-eqiad.wikimedia.org", "prometheus-codfw.wikimedia.org", "prometheus-esams.wikimedia.org", "prometheus-ulsfo.wikimedia.org", "prometheus-eqsin.wikimedia.org", "prometheus-drmrs.wikimedia.org"] - transport_socket: +# Non-SNI support +- transport_socket: name: envoy.transport_sockets.tls typed_config: '@type': type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext common_tls_context: tls_certificates: - - certificate_chain: { filename: "/etc/ssl/localcerts/prometheus.wikimedia.org.crt" } - private_key: { filename: "/etc/ssl/private/prometheus.wikimedia.org.key" } + - certificate_chain: { filename: "/etc/envoy/ssl/discovery__prometheus_discovery_wmnet_server.chained.pem" } + private_key: { filename: "/etc/envoy/ssl/discovery__prometheus_discovery_wmnet_server-key.pem" } filters: - name: envoy.http_connection_manager typed_config: "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager + stat_prefix: ingress_http use_remote_address: true skip_xff_append: false - http_protocol_options: - accept_http_10: true - stat_prefix: ingress_http route_config: virtual_hosts: - - name: default + - name: non_sni_port_80 domains: ["*"] routes: - match: { prefix: "/" } @@ -40,4 +37,6 @@ - name: envoy.filters.http.router typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router + http_protocol_options: + accept_http_10: true server_header_transformation: APPEND_IF_ABSENT
Your experimental patch (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1019066/1/hieradata/role/common/prometheus/pop.yaml right?) is also removing the "sni_support: strict" line.