The recent on-demand-api performance tests revealed high CPU time by net/http tls handshakes. And, while the tests were running it was found that a single API instance was establishing more than 50k connections to s3 which clearly depicts this issue
This is owing to the fact that the existing s3.go library uses the default aws sdk v1 config and the aws sdk uses a default http client with default configuration values
func New(env *env.Environment) s3iface.S3API {
cfg := &aws.Config{
Region: aws.String(env.AWSRegion),
}The default transport:
- Does keep-alive, but
- Has no TLS session resumption (ClientSessionCache is nil),
- Very conservative idle connection settings.
Detailed settings of the default http client
Idle connections: MaxIdleConns: 100 MaxIdleConnsPerHost: 2 * At most 2 idle keep-alive connections per S3 endpoint. With concurrency >2, new TLS handshakes happen all the time. Idle timeout: IdleConnTimeout: 90 seconds * idle conns are closed after 90s. TLS: TLSHandshakeTimeout: 10s TLSClientConfig: nil (so no ClientSessionCache)
In case of on-demand api, the NewGetLargeEntities fans out multiple parallel GetObjectWithContext calls, some requests reuse connections, but many still renegotiate full TLS handshakes.
Hence the recommendation is to use the custom http client with the below settings. The NewLRUClientSessionCache ensures tls resumption
func New(env *env.Environment) s3iface.S3API {
cfg := &aws.Config{
Region: aws.String(env.AWSRegion),
HTTPClient: &http.Client{
Transport: &http.Transport{
TLSClientConfig: &tls.Config{
MinVersion: tls.VersionTLS12,
ClientSessionCache: tls.NewLRUClientSessionCache(128),
},
MaxIdleConns: 100,
MaxIdleConnsPerHost: 100,
IdleConnTimeout: 90 * time.Second,
},
},
}This can be validated by running the current perf tests and inspecting the continuous profiler graph for TLS handshake activity.
