We've been spotting some issues on the upload cluster that aren't happening on text. One of the big differences between text and upload is that almost all origin servers in text uses envoy as its TLS termination and swift uses nginx. The nginx puppetization is the one that we used to leverage in the traffic team to perform TLS termination for untrusted clients.
One of this issues is a FetchError logged by varnish-frontend stating "Timed out reusing backend connection", according to logstash during the last month all the ocurrences of this issue are limited to the upload cluster.
Progress on migration to envoy:
- ms-fe1009
- ms-fe1010
- ms-fe1011
- ms-fe1012
- ms-fe1013
- ms-fe1014
- moss-fe1001
- ms-fe2009
- ms-fe2010
- ms-fe2011
- ms-fe2012
- ms-fe2013
- ms-fe2014
- moss-fe2001
Outstanding is changing the value of profile::swift::proxy::use_envoy: for the ms-* clusters (or maybe globally, but that's likely to upset beta)