Page MenuHomePhabricator

Upgrade Envoy to v1.26.8 and drop buster
Closed, ResolvedPublic

Description

For T380211 we're upgrading from 1.23 (July 2022) all the way to something from this year. Because we've gotten so far behind, I'll be making the upgrade in a few steps, starting with a move to 1.26.

We were also unable to build 1.24 or better on buster's version of libc. Now that no buster hosts are left running Envoy, this will move the build (and minimum supported distribution) to bullseye.

This update will also fix the following security issues:

Crash in proxy protocol when command type of LOCAL (CVE-2024-23327)
https://github.com/envoyproxy/envoy/security/advisories/GHSA-4h5x-x9vh-m29j
https://github.com/envoyproxy/envoy/commit/63895ea8e3cca9c5d3ab4c5c128ed1369969d54a

Envoy crashes when using an address type that isn’t supported by the OS (CVE-2024-23325)
https://github.com/envoyproxy/envoy/security/advisories/GHSA-5m7c-mrwr-pm26
https://github.com/envoyproxy/envoy/commit/bacd3107455b8d387889467725eb72aa0d5b5237

Ext_authz can be bypassed when Proxy protocol filter sets invalid UTF-8 metadata (CVE-2024-23324)
https://github.com/envoyproxy/envoy/security/advisories/GHSA-gq3v-vvhj-96j6
https://github.com/envoyproxy/envoy/commit/29989f6cc8bfd8cd2ffcb7c42711eb02c7a5168a

Excessive CPU usage when URI template matcher is configured using regex (CVE-2024-23323)
https://github.com/envoyproxy/envoy/security/advisories/GHSA-x278-4w4x-r7ch
https://github.com/envoyproxy/envoy/commit/71eeee8f0f0132f39e402b0ee23b361ee2f4e645

Envoy crashes when idle and request per try timeout occur within the backoff interval (CVE-2024-23322)
https://github.com/envoyproxy/envoy/security/advisories/GHSA-6p83-mfmh-qv38
https://github.com/envoyproxy/envoy/commit/843f9e6a123ed47ce139b421c14e7126f2ac685e

Abnormal termination when using auto_sni with :authority header longer than 255 characters (CVE-2024-32475)
https://github.com/envoyproxy/envoy/security/advisories/GHSA-3mh5-6q8v-25wj
https://github.com/envoyproxy/envoy/commit/b47fc6648d7c2dfe0093a601d44cb704b7bad382

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/deployment-chartsmaster+0 -2
operations/deployment-chartsmaster+0 -9
operations/deployment-chartsmaster+4 -2
operations/deployment-chartsmaster+2 -4
operations/deployment-chartsmaster+7 -14
operations/deployment-chartsmaster+2 -0
operations/docker-images/production-imagesmaster+6 -0
operations/dnsmaster+1 -1
operations/deployment-chartsmaster+1 -1
operations/docker-images/production-imagesmaster+12 -2
operations/deployment-chartsmaster+2 -0
integration/configmaster+1 -1
integration/configmaster+8 -1
operations/docker-images/production-imagesmaster+7 -0
operations/docker-images/production-imagesmaster+7 -1
operations/debs/envoyproxyv1.26+4 -1
operations/debs/envoyproxyv1.26+19 -10
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1182209 merged by RLazarus:

[operations/docker-images/production-images@master] envoy-future: Sync dockerfile changes from envoy image to envoy-future

https://gerrit.wikimedia.org/r/1182209

Change #1182210 merged by jenkins-bot:

[operations/deployment-charts@master] mathoid: Upgrade to envoy-future:1.26.8-3 for validation

https://gerrit.wikimedia.org/r/1182210

Validated on mathoid and mw-debug (mathoid still on envoy-future, mw-debug back on 1.23 for now).

One config warning in the logs from mw-debug:

[2025-08-26 22:40:20.622][1][warning][misc] [source/common/protobuf/message_validator_impl.cc:21] Deprecated field: type envoy.config.core.v3.HeaderValueOption Using deprecated option 'envoy.config.core.v3.HeaderValueOption.append' from file base.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/version_history/version_history for details. If continued use of this field is absolutely necessary, see https://www.envoyproxy.io/docs/envoy/latest/configuration/operations/runtime#using-runtime-overrides-for-deprecated-features for how to apply a temporary and highly discouraged override.

That doesn't block us from deploying 1.26, but I'll update the config templates before upgrading further. No complaints in the mathoid envoy logs.

Mentioned in SAL (#wikimedia-operations) [2025-08-26T23:01:15Z] <rzl> reprepro -C main includedeb bullseye-wikimedia /srv/wikimedia/pool/component/envoy-future/e/envoyproxy/envoyproxy_1.26.8-1_amd64.deb # T402584

More deprecation warnings from the API Gateway (started locally after modifying charts/api-gateway/values-devel.yaml to use envoy-future:

[source/common/protobuf/message_validator_impl.cc:21] Deprecated field: type envoy.config.bootstrap.v3.Admin Using deprecated option 'envoy.config.bootstrap.v3.Admin.access_log_path' from file bootstrap.proto.  [...]
[source/common/protobuf/message_validator_impl.cc:21] Deprecated field: type envoy.config.cluster.v3.Cluster Using deprecated option 'envoy.config.cluster.v3.Cluster.common_http_protocol_options' from file cluster.proto.  [...]
[source/common/protobuf/message_validator_impl.cc:21] Deprecated field: type envoy.config.cluster.v3.Cluster Using deprecated option 'envoy.config.cluster.v3.Cluster.http2_protocol_options' from file cluster.proto.  [...]
[source/common/protobuf/message_validator_impl.cc:21] Deprecated field: type envoy.config.cluster.v3.Cluster Using deprecated option 'envoy.config.cluster.v3.Cluster.max_requests_per_connection' from file cluster.proto.  [...]
[source/common/protobuf/message_validator_impl.cc:21] Deprecated field: type envoy.config.core.v3.HeaderValueOption Using deprecated option 'envoy.config.core.v3.HeaderValueOption.append' from file base.proto.  [...]
[source/common/protobuf/message_validator_impl.cc:21] Deprecated field: type envoy.config.route.v3.HeaderMatcher Using deprecated option 'envoy.config.route.v3.HeaderMatcher.exact_match' from file route_components.proto.  [...]
[source/common/protobuf/message_validator_impl.cc:21] Deprecated field: type envoy.config.route.v3.HeaderMatcher Using deprecated option 'envoy.config.route.v3.HeaderMatcher.safe_regex_match' from file route_components.proto.  [...]
[source/common/protobuf/message_validator_impl.cc:21] Deprecated field: type envoy.config.route.v3.RouteAction Using deprecated option 'envoy.config.route.v3.RouteAction.cors' from file route_components.proto.  [...]
[source/common/protobuf/message_validator_impl.cc:21] Deprecated field: type envoy.type.matcher.v3.RegexMatcher Using deprecated option 'envoy.type.matcher.v3.RegexMatcher.google_re2' from file regex.proto.  [...]

But nothing blocking here either.

Mentioned in SAL (#wikimedia-operations) [2025-08-27T10:49:32Z] <slyngs> idm2001.wikimedia.org - Update EnvoyProxy to version 1.26.8 - https://phabricator.wikimedia.org/T402584

Mentioned in SAL (#wikimedia-operations) [2025-08-27T10:54:03Z] <slyngs> idm1001.wikimedia.org - Update EnvoyProxy to version 1.26.8 - https://phabricator.wikimedia.org/T402584

Change #1182521 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/dns@master] Failover idp.w.o to idp2004

https://gerrit.wikimedia.org/r/1182521

Change #1182521 merged by Muehlenhoff:

[operations/dns@master] Failover idp.w.o to idp2004

https://gerrit.wikimedia.org/r/1182521

Mentioned in SAL (#wikimedia-operations) [2025-08-27T12:58:49Z] <moritzm> upgrading envoy on testreduce T402584

Mentioned in SAL (#wikimedia-operations) [2025-08-27T16:58:46Z] <mutante> upgrading envoyproxy on people* hosts T402584

Mentioned in SAL (#wikimedia-operations) [2025-08-27T17:05:12Z] <mutante> upgrading envoyproxy on aphlict* and zuul* hosts T402584

Mentioned in SAL (#wikimedia-operations) [2025-08-27T17:21:02Z] <mutante> upgrading envoyproxy on releases* and planet* hosts T402584

Mentioned in SAL (#wikimedia-operations) [2025-08-27T17:46:15Z] <mutante> upgrading envoyproxy on doc* and etherpad* hosts T402584

Mentioned in SAL (#wikimedia-operations) [2025-08-27T18:05:23Z] <mutante> upgrading envoyproxy on phab2002, lists2001, contint2002 T402584

Mentioned in SAL (#wikimedia-operations) [2025-08-27T18:21:02Z] <arnoldokoth> Upgrade envoyproxy on vrts2002 T402584

Change #1182659 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/docker-images/production-images@master] envoy: Update to v1.26.8

https://gerrit.wikimedia.org/r/1182659

Change #1182659 merged by RLazarus:

[operations/docker-images/production-images@master] envoy: Update to v1.26.8

https://gerrit.wikimedia.org/r/1182659

Change #1182680 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/deployment-charts@master] api-gateway: Upgrade to Envoy 1.26.8 in staging

https://gerrit.wikimedia.org/r/1182680

Change #1182681 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/deployment-charts@master] api-gateway: Upgrade to Envoy 1.26.8 in production

https://gerrit.wikimedia.org/r/1182681

Mentioned in SAL (#wikimedia-operations) [2025-08-28T12:04:16Z] <moritzm> upgrading debmonitor to Envoy 1.26.8 T402584

Mentioned in SAL (#wikimedia-operations) [2025-08-28T13:45:25Z] <moritzm> upgrading puppetboard to Envoy 1.26.8 T402584

Change #1182680 merged by jenkins-bot:

[operations/deployment-charts@master] {api,rest}-gateway: Upgrade to Envoy 1.26.8 in staging

https://gerrit.wikimedia.org/r/1182680

Change #1182945 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/deployment-charts@master] mw-*: Upgrade to Envoy 1.26.8

https://gerrit.wikimedia.org/r/1182945

Change #1182945 merged by jenkins-bot:

[operations/deployment-charts@master] mw-*: Upgrade to Envoy 1.26.8

https://gerrit.wikimedia.org/r/1182945

Change #1182681 merged by jenkins-bot:

[operations/deployment-charts@master] {api,rest}-gateway: Upgrade to Envoy 1.26.8 in production

https://gerrit.wikimedia.org/r/1182681

Mentioned in SAL (#wikimedia-operations) [2025-09-03T13:35:50Z] <urandom> upgrading envoyproxy to 1.26.8-1, restbase/eqiad (cassandra) rack 'a' — T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-03T13:45:16Z] <urandom> upgrading envoyproxy to 1.26.8-1, restbase/eqiad (cassandra) rack 'b' — T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-03T13:49:36Z] <urandom> upgrading envoyproxy to 1.26.8-1, restbase/eqiad (cassandra) rack 'd' — T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-03T15:00:54Z] <urandom> upgrading envoyproxy to 1.26.8-1, restbase/codfw — T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-04T12:14:01Z] <arnoldokoth> Upgrade envoyproxy on vrts1003 T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-04T14:25:54Z] <moritzm> upgrade Envoyproxy on webperf* T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-04T14:51:34Z] <moritzm> upgrade Envoyproxy on Puppet servers T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-04T15:22:21Z] <moritzm> upgrade Envoyproxy on cloudweb servers T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-04T16:35:04Z] <btullis> upgrading and restarting envoyproxy on cephosd1001 for T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-04T16:39:50Z] <btullis> upgrading and restarting envoyproxy on cephosd100[2-5] for T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-04T16:44:49Z] <btullis> upgrading and restarting envoyproxy on cephosd200[1-3] for T402584

Change #1184918 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/deployment-charts@master] mw-videoscaler: Upgrade to envoy 1.26.8

https://gerrit.wikimedia.org/r/1184918

Mentioned in SAL (#wikimedia-operations) [2025-09-08T07:46:13Z] <moritzm> upgrading Envoy on an-web, an-tool1007 (turnilo), an-tool1008 (yarn) T402584

Change #1184918 merged by jenkins-bot:

[operations/deployment-charts@master] mw-videoscaler: Upgrade to envoy 1.26.8

https://gerrit.wikimedia.org/r/1184918

Mentioned in SAL (#wikimedia-operations) [2025-09-08T16:40:27Z] <denisse> Upgrade envoyproxy on grafana2001 - T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-08T16:41:21Z] <denisse> Upgrade envoyproxy on grafana1002 - T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-08T16:43:24Z] <denisse> Upgrade envoyproxy on prometheus1005 - T402584

Change #1185995 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/deployment-charts@master] cleanup: Remove Envoy 1.26.8 overrides now that it's the default

https://gerrit.wikimedia.org/r/1185995

Mentioned in SAL (#wikimedia-operations) [2025-09-08T17:17:22Z] <denisse> Upgrade envoyproxy on prometheus[1006-1008] and [2005-2008] - T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-08T17:21:48Z] <denisse> Upgrade envoyproxy on prometheus::pop hosts - T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-08T17:23:30Z] <denisse> Upgrade envoyproxy on titan1001 - T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-08T17:26:25Z] <denisse> Upgrade envoyproxy on titan hosts - T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-08T17:28:42Z] <denisse> Upgrade envoyproxy on graphite hosts - T402584

Change #1185995 merged by jenkins-bot:

[operations/deployment-charts@master] cleanup: Remove Envoy 1.26.8 overrides now that it's the default

https://gerrit.wikimedia.org/r/1185995

This comment was removed by elukey.

I noticed the following occurrences of Buster images:

dse-k8s-eqiad-152-namespace: datasets-config
dse-k8s-eqiad-157-namespace: datasets-config-next

eqiad: device-analytics
eqiad: edit-analytics
eqiad: editor-analytics
eqiad: geo-analytics
eqiad: image-suggestion
eqiad: media-analytics
eqiad: page-analytics

@RLazarus Hi! Is there a plan for those or it is the usual "no-owners" etc.. zone? I can help if the case is the latter :)

@BTullis Hi! Could you take care of the dse ones? Should be just a matter of doing a deploy :)

Mentioned in SAL (#wikimedia-operations) [2025-09-09T13:53:14Z] <moritzm> upgrading Envoy on config-master* T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-09T14:03:07Z] <moritzm> upgrading Envoy on schema* T402584

@elukey Thank you! Looks like an ownership issue, and yes please if you're comfortable deploying those, I'll take you up on it. (We were just talking in serviceops about the general problem of keeping the state of the world up to date with the state of the repo. In the general case it's hard and we'll need to figure it out; in the specific case your help would make a big difference!)

Change #1186676 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/deployment-charts@master] {api,rest}-gateway: Upgrade to Envoy 1.29.12 in staging

https://gerrit.wikimedia.org/r/1186676

Ack! Upgraded staging, and pinged the DSE SREs as well on slack to gather their opinion about ownership etc..

Mentioned in SAL (#wikimedia-operations) [2025-09-10T07:49:10Z] <moritzm> upgrading Envoy on chartmuseum* T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-10T07:50:18Z] <brouberol> upgraded envoy on dse-k8s-eqiad/dataset-config(-next) - T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-10T09:38:26Z] <moritzm> upgrading Envoy on Logstash T402584

Mentioned in SAL (#wikimedia-operations) [2025-09-10T09:47:42Z] <moritzm> upgrading Envoy on contint T402584

All baremetal installations of Envoy have been upgraded

Mentioned in SAL (#wikimedia-operations) [2025-09-10T10:06:42Z] <moritzm> upgrading Envoy on Phabricator T402584

Ack! Upgraded staging, and pinged the DSE SREs as well on slack to gather their opinion about ownership etc..

I ended up deploying the namespaces and Santiago Faci tested them, all good!

Change #1190376 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/deployment-charts@master] {api,rest}-gateway: Upgrade to Envoy 1.29.12 in production

https://gerrit.wikimedia.org/r/1190376

Change #1191203 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/deployment-charts@master] wikifeeds: Remove envoy image_version override

https://gerrit.wikimedia.org/r/1191203

I deployed most services in wikikube (in part to test https://gerrit.wikimedia.org/r/1188456). Remaining services with an Envoy upgrade to go:

commons-impact-analyticsstaging
eventstreams-internalcodfw
rdf-streaming-updatereqiad
wikifeedsstaging, eqiad, codfw

The first three had some other outstanding diff that wasn't obviously impact-free (in each case, only in one cluster). I've pinged Data Platform SRE for all three of those. For wikifeeds see https://gerrit.wikimedia.org/r/1191203.

After that we can consider this complete.

Change #1191203 merged by jenkins-bot:

[operations/deployment-charts@master] wikifeeds: Remove envoy image_version override

https://gerrit.wikimedia.org/r/1191203

1.23 is gone. 🎉