cache_upload: uncompressed images with Content-Encoding: gzip cause content decoding issues
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• ema
	Oct 21 2016, 12:40 PM

Description

It has been reported that at least one png image fails to load with ERR_CONTENT_DECODING_FAILED: https://upload.wikimedia.org/wikipedia/commons/thumb/4/49/Relief_map_of_Serbia.png/272px-Relief_map_of_Serbia.png in esams. All esams frontend are affected, the object is cached on cp3037's varnish-be.

The error is due to the fact that the image is not compressed, but the response contains Content-Encoding: gzip.

curl --compressed fails with the following message:

curl: (61) Error while processing content unencoding: invalid stored block lengths

Most likely, the problem is not related to esams-specific network conditions. esams is probably just the DC that happened to cache a bad copy.

It is not clear why we try to gzip pngs at all, even if the browser accepts gzip.

Using curl --compressed against vanrnish-be works fine.

Details

	Subject	Repo	Branch	Lines +/-
	Reimage sca[12]00[12] as scb[12]00[34]	operations/puppet	production	+10 -18

Customize query in gerrit

Related Objects

Mentioned In: T148917: Thumbnails failing to render sporadically (ERR_CONNECTION_CLOSED or ERR_SSL_BAD_RECORD_MAC_ALERT)
Mentioned Here: T162035: Some PNG thumbnails and JPEG originals delivered as [text/html] content-type and hence not rendered in browser

Event Timeline

• ema created this task.Oct 21 2016, 12:40 PM

Restricted Application added a project: SRE. · View Herald TranscriptOct 21 2016, 12:40 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

• ema updated the task description. (Show Details)Oct 21 2016, 12:43 PM

elukey subscribed.Oct 21 2016, 1:08 PM

• ema triaged this task as Medium priority.Oct 21 2016, 1:10 PM

• ema moved this task from Backlog to Caching on the Traffic board.

After further investigation we've noticed that the problem is not reproducible forcing a cache miss by adding some random query parameters. Further, we've tried purging the affected object from a varnish frontend and fetching it again; the issue couldn't be reproduced.

The two varnish backends in the path from esams to eqiad are cp1072 and cp3037, These are the possible scenarios we think might have triggered the problem:

cp1072 temporarily emitted CE:gzip with this object then later stopped doing so (for the same cache object)
cp3037 temporarily added CE:gzip on reception, then was later evicted/purged and cleaned itself up
cp3037 may or may not have ever evicted/purged since, but temporarily added CE:gzip to its response to several frontends
several frontends all added CE:gzip to their cache objects on reception from cp3037, and then whatever triggers that went away

Scenario 4) seems unlikely because that would imply multiple temporary issues on all frontends. Scenario 1) also is not the most likely one given that we know for sure (because of Age), that cp1072 is still using the same unevicted/unexpired cache object it was at the time the problem started, and doesn't have bad output now.

Srdjan subscribed.Oct 21 2016, 1:51 PM

• ema renamed this task from ERR_CONTENT_DECODING_FAILED on certain png images from varnish-fe to cache_upload: uncompressed images with Content-Encoding: gzip cause content decoding issues.Oct 21 2016, 2:04 PM

• ema updated the task description. (Show Details)

The specific repro URL for the Serbia map has been PURGEd now to clear up the issue for users, since we're not getting much debug value out of keeping it broken.

To be clearer about what was debugged on IRC: this wasn't a case of actual bad gzip encoding. The object contents in all affected caches were always the correct, uncompressed PNG data. The issue was just that a Content-Encoding: gzip header was on the objects (and thus the outputs) in the affected frontends for unknown reasons, which caused clients to attempt to interpret the otherwise-ok PNG data as gzipped content (and thus fail to gzip decode what isn't gzipped).

Arseny1992 mentioned this in T148917: Thumbnails failing to render sporadically (ERR_CONNECTION_CLOSED or ERR_SSL_BAD_RECORD_MAC_ALERT).Oct 24 2016, 7:00 AM

Closing for now as we haven't seen a further complaint of this. It may have been some temporary error condition we'll never reproduce. Will re-open if it happens again!

Reopening, another instance of this bug has been reported in T162035#3168304.

AFAIK with the resolution of T162035 we haven't had further reports.

cache_upload: uncompressed images with Content-Encoding: gzip cause content decoding issuesClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

cache_upload: uncompressed images with Content-Encoding: gzip cause content decoding issues
Closed, ResolvedPublic
Actions