Page MenuHomePhabricator

envoyproxy: CVE-2020-8664 CVE-2020-8661 CVE-2020-8660 CVE-2020-8659
Closed, ResolvedPublic

Description

The new Envoy 1.13.1 fixes four security issues:
https://www.envoyproxy.io/docs/envoy/v1.13.1/intro/version_history

It's not yet clear whether they affect the 1.12 release we're running, but seems likely:

CVE-2020-8664:
A vulnerability was found in Envoy, where using SDS with Combined Validation Context Using the same secret (e.g. trusted CA) across many resources together with the combined validation context could lead to the “static” part of the validation context to be not applied, even though it was visible in the active config dump.

CVE-2020-8661:
A vulnerability was found in Envoy version 1.13.0 or earlier may consume excessive amounts of memory when responding internally to pipelined requests.

CVE-2020-8660:
A vulnerability was found in Envoy. where TLS inspector could have been bypassed (not recognized as a TLS client) by a client using only TLS 1.3. Because TLS extensions (SNI, ALPN) were not inspected, those connections might have been matched to a wrong filter chain, possibly bypassing some security restrictions in the process.

CVE-2020-8659:
A vulnerability was found in Envoy version 1.13.0 or earlier may consume excessive amounts of memory when proxying HTTP/1.1 requests or responses with many small (i.e. 1 byte) chunks.

Event Timeline

yes they do, they also released 1.12.3

I think we can move to 1.13 and slowly rollout the change.

Assigning to our envoy-build expert in residence :P

I'm pretty sure our envoy-build expert in residence just assigned this to me, but I'm happy to give this a shot anyway.

Belated update: we decided to upgrade to 1.13.1 (not 1.12.3). So far it's deployed to all MW hosts in codfw, plus the MW canaries in eqiad. Monitoring for impact, then we'll proceed to the rest of MW, then Kubernetes hosts.

Change 580906 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/docker-images/production-images@master] Update envoy, add ability to define an idle timeout

https://gerrit.wikimedia.org/r/580906

Deployed 1.13.1 to all hosts where we're using envoy as a TLS proxy, that is, C:profile::tlsproxy::envoy. Exception: mendelevium.eqiad.wmnet is still running Jessie for now, and Envoy 1.11.1. (fyi @Dzahn)

ACK, that's T224590 and i just made T248028 to keep that moving.

All remaining MW hosts are updated. That leaves parsoid, snapshot hosts, and a few other odds and ends.

Change 581747 had a related patch set uploaded (by RLazarus; owner: RLazarus):
[operations/docker-images/production-images@master] Bump versions for envoy and envoy-tls-local-proxy to 1.13.1.

https://gerrit.wikimedia.org/r/581747

Change 581747 merged by RLazarus:
[operations/docker-images/production-images@master] Bump versions for envoy and envoy-tls-local-proxy to 1.13.1.

https://gerrit.wikimedia.org/r/581747

All hosts updated except snapshot1006, held back until later this week per @ArielGlenn.

Change 582880 had a related patch set uploaded (by RLazarus; owner: RLazarus):
[operations/deployment-charts@master] Bump tls.image_version for envoy update to 1.13.1.

https://gerrit.wikimedia.org/r/582880

Change 582880 merged by jenkins-bot:
[operations/deployment-charts@master] Bump tls.image_version for envoy update to 1.13.1.

https://gerrit.wikimedia.org/r/582880

Mentioned in SAL (#wikimedia-operations) [2020-03-25T08:14:14Z] <_joe_> upgrading all eventgate-main to envoy 1.13.1 T246868

Mentioned in SAL (#wikimedia-operations) [2020-03-25T16:07:18Z] <rlazarus> updating blubberoid to envoy 1.13.1 T246868

Mentioned in SAL (#wikimedia-operations) [2020-03-25T19:29:28Z] <rlazarus> updating eventstreams to envoy 1.13.1 T246868

Mentioned in SAL (#wikimedia-operations) [2020-03-25T20:19:48Z] <rlazarus> updating citoid to envoy 1.13.1 T246868

Mentioned in SAL (#wikimedia-operations) [2020-03-25T20:22:38Z] <rlazarus> updating cxserver to envoy 1.13.1 T246868

Mentioned in SAL (#wikimedia-operations) [2020-03-25T21:07:00Z] <rlazarus> updating eventgate-analytics to envoy 1.13.1 T246868

Mentioned in SAL (#wikimedia-operations) [2020-03-25T21:16:32Z] <rlazarus> holding off on updating eventgate-analytics until EU time, to check on unexpected helmfile diffs T246868

Mentioned in SAL (#wikimedia-operations) [2020-03-25T21:44:36Z] <rlazarus> updating eventgate-analytics-external to envoy 1.13.1 T246868

Mentioned in SAL (#wikimedia-operations) [2020-03-25T22:05:13Z] <rlazarus> updating eventgate-logging-external to envoy 1.13.1 T246868

All kubernetes services are updated in all clusters. (T246868#6000068 turned out to be operator error, there were no unexpected diffs.)

Thanks for the pointer, but it looks like the timestamps don't line up -- that's been alerting since a couple hours before I touched it here. (Alert at 18:39, my deploy at 20:22.) I wish I'd spotted it before I touched production, but otherwise it shouldn't be related to this change.

It looks like @akosiaris filed T248578 and there's a cause identified.

@RLazarus this is done, right? Is there anything left to do?

Just snapshot1006 left, it's on my list for this morning.