Some PNG thumbnails and JPEG originals delivered as [text/html] content-type and hence not rendered in browser
Closed, ResolvedPublic

Description

Hi,

please solve missing 200px size of an picture, see https://upload.wikimedia.org/wikipedia/commons/thumb/5/57/Status_iucn3.1_LC_cs.svg/200px-Status_iucn3.1_LC_cs.svg.png Other sizes works correctly, see https://upload.wikimedia.org/wikipedia/commons/thumb/5/57/Status_iucn3.1_LC_cs.svg/201px-Status_iucn3.1_LC_cs.svg.png for example.

This image is very frequented in the articles. I triaged this as unbreak now and put a notice at #wikimedia-tech therefore.

I get ERR_CONTENT_DECODING_FAILED as the error code (Chromium). Lastly when I saw this kind of error it was because the server was lying about used compression. It seems to be a problem of only one/several clusters as not all users get the error (@elukey can't reproduce it but @ema who is according to information at IRC close to me and probably using same cluster can).

See https://cs.wikipedia.org/w/index.php?title=Wikipedie:Pod_l%C3%ADpou_(technika)&oldid=14873106#Dal.C5.A1.C3.AD_problematick.C3.BD_obr.C3.A1zek for information what it shows now (just a place for the image+alt).

Tried browsers

  • Chromium 56.0.2924.76
  • Firefox 52.0.1

Other tried ways how to access an URL

  • wget 1.18 (no problem)
  • curl 7.50.1 (no problem)
  • curl 7.50.1 with spoofed user-agent (no problem)

Seems it affect only users not bots.

Martin Urbanec

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a project: User-Urbanecm. · View Herald TranscriptMon, Apr 3, 11:54 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Urbanecm triaged this task as "Unbreak Now!" priority.Mon, Apr 3, 11:55 AM

Breaking change => UBN!

Restricted Application added subscribers: Jay8g, TerraCodes, Dereckson. · View Herald TranscriptMon, Apr 3, 11:55 AM
Urbanecm edited the task description. (Show Details)Mon, Apr 3, 11:59 AM
Urbanecm edited the task description. (Show Details)Mon, Apr 3, 12:13 PM
Urbanecm added subscribers: elukey, ema.
Urbanecm edited the task description. (Show Details)Mon, Apr 3, 12:16 PM
Aklapper lowered the priority of this task from "Unbreak Now!" to "High".Mon, Apr 3, 12:22 PM

The file is not missing, it just has a wrong type and cannot be rendered ([text/html] vs [image/png]):

$:acko\> wget -v https://upload.wikimedia.org/wikipedia/commons/thumb/5/57/Status_iucn3.1_LC_cs.svg/200px-Status_iucn3.1_LC_cs.svg.png
--2017-04-03 14:20:28--  https://upload.wikimedia.org/wikipedia/commons/thumb/5/57/Status_iucn3.1_LC_cs.svg/200px-Status_iucn3.1_LC_cs.svg.png
Resolving upload.wikimedia.org (upload.wikimedia.org)... 91.198.174.208, 2620:0:862:ed1a::2:b
Connecting to upload.wikimedia.org (upload.wikimedia.org)|91.198.174.208|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8100 (7.9K) [text/html]
Saving to: ‘200px-Status_iucn3.1_LC_cs.svg.png’

200px-Status_iucn3.1_LC_cs.svg.png                   100%[=====================================================================================================================>]   7.91K  --.-KB/s    in 0s      

2017-04-03 14:20:28 (15.9 MB/s) - ‘200px-Status_iucn3.1_LC_cs.svg.png’ saved [8100/8100]
$:acko\> wget -v https://upload.wikimedia.org/wikipedia/commons/thumb/5/57/Status_iucn3.1_LC_cs.svg/201px-Status_iucn3.1_LC_cs.svg.png
--2017-04-03 14:21:37--  https://upload.wikimedia.org/wikipedia/commons/thumb/5/57/Status_iucn3.1_LC_cs.svg/201px-Status_iucn3.1_LC_cs.svg.png
Resolving upload.wikimedia.org (upload.wikimedia.org)... 91.198.174.208, 2620:0:862:ed1a::2:b
Connecting to upload.wikimedia.org (upload.wikimedia.org)|91.198.174.208|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8699 (8.5K) [image/png]
Saving to: ‘201px-Status_iucn3.1_LC_cs.svg.png’

201px-Status_iucn3.1_LC_cs.svg.png                   100%[=====================================================================================================================>]   8.50K  --.-KB/s    in 0s      

2017-04-03 14:21:38 (68.6 MB/s) - ‘201px-Status_iucn3.1_LC_cs.svg.png’ saved [8699/8699]
Aklapper changed the title from "Solve missing 200px size of File:Status_iucn3.1_LC_cs.svg" to "Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser".Mon, Apr 3, 12:23 PM

Mentioned in SAL (#wikimedia-operations) [2017-04-03T12:28:51Z] <ema> banning 200px-Status_iucn3.1_LC_cs.svg.png from esams frontends T162035

It seems like it.

ema added a comment.Mon, Apr 3, 1:19 PM

Note that the issue is pretty widespread, I'm seeing lots of requests affected by this.

It is the application itself returning the wrong Content-Type:

*   << Request  >> 389680515 
-   Begin          req 390692906 rxreq
-   Timestamp      Start: 1491225372.327099 0.000000 0.000000
-   Timestamp      Req: 1491225372.327099 0.000000 0.000000
-   ReqStart       10.64.48.105 52585
-   ReqMethod      GET
-   ReqURL         /wikipedia/commons/thumb/6/6a/1710_J._F._Reimmann%2C_Versuch_einer_Einleitung_in_die_Historiam_Literariam%2C_vol.3%2C2.png/220px-1710_J._F._Reimmann%2C_Versuch_einer_Einleitung_in_die_Historiam_Literariam%2C_vol.3%2C2.png
-   ReqProtocol    HTTP/1.1
-   ReqHeader      X-Client-IP: XXX.XXX.XXX.XXX
-   ReqHeader      X-Forwarded-Proto: https
-   ReqHeader      X-Connection-Properties: H2=1; SSR=0; SSL=TLSv1.2; C=ECDHE-ECDSA-CHACHA20-POLY1305; EC=X25519;
-   ReqHeader      user-agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Dragon/52.15.25.664 Chrome/52.0.2743.82 Safari/537.36
-   ReqHeader      accept: image/webp,image/*,*/*;q=0.8
-   ReqHeader      referer: https://de.wikipedia.org/
-   ReqHeader      accept-language: de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4
-   ReqHeader      cookie: GeoIP=XXX
-   ReqHeader      Host: upload.wikimedia.org
-   ReqHeader      X-Forwarded-For: XXX.XXX.XXX.XXX, 10.64.48.105
-   ReqHeader      via-nginx: 1
-   ReqHeader      X-WMF-LastGlobalStamp: 03-Apr-2017
-   ReqHeader      Accept-Encoding: gzip
-   ReqHeader      X-CDIS: miss
-   ReqHeader      X-Varnish: 229770791
-   ReqUnset       X-Forwarded-For: XXX.XXX.XXX.XXX, 10.64.48.105
-   ReqHeader      X-Forwarded-For: XXX.XXX.XXX.XXX, 10.64.48.105, 10.64.48.105
-   VCL_call       RECV
-   VCL_acl        MATCH wikimedia_trust "10.0.0.0"/8
-   ReqUnset       X-CDIS: miss
-   ReqHeader      X-DCPath: eqiad
-   VCL_return     hash
-   VCL_call       HASH
-   VCL_return     lookup
-   Hit            172327145
-   VCL_call       HIT
-   ReqHeader      X-CDIS: hit
-   VCL_return     miss
-   VCL_call       MISS
-   ReqUnset       X-CDIS: hit
-   ReqHeader      X-CDIS: miss
-   VCL_return     fetch
-   Link           bereq 389680516 fetch
-   Timestamp      Fetch: 1491225372.412188 0.085090 0.085090
-   RespProtocol   HTTP/1.1
-   RespStatus     200
-   RespReason     OK
-   RespHeader     Content-Type: text/html; charset=UTF-8
-   RespHeader     Access-Control-Allow-Origin: *
-   RespHeader     X-Trans-Id: txbb40584c2ed44a4ca5699-0058e24b1c
-   RespHeader     Date: Mon, 03 Apr 2017 13:16:12 GMT
-   RespHeader     Content-Length: 34591
-   RespHeader     Last-Modified: Fri, 25 Oct 2013 04:52:03 GMT
-   RespHeader     Etag: 5dcdfb14ebace391a4ab7eb006dcee37
-   RespHeader     X-Timestamp: 1382676722.00947
-   RespHeader     X-MediaWiki-Original: /wikipedia/commons/6/6a/1710_J._F._Reimmann%2C_Versuch
-   RespHeader     X-Varnish: 389680515
-   RespHeader     Age: 0
-   RespHeader     Via: 1.1 varnish-v4
-   VCL_call       DELIVER
-   RespHeader     X-Cache-Int: cp1073 miss
-   VCL_return     deliver
-   Timestamp      Process: 1491225372.412209 0.085110 0.000021
-   RespHeader     Accept-Ranges: bytes
-   Debug          "RES_MODE 2"
-   RespHeader     Connection: keep-alive
-   Timestamp      Resp: 1491225372.474162 0.147063 0.061953
-   ReqAcct        921 0 921 579 34591 35170
-   End
Urbanecm moved this task from Backlog to Watching on the User-Urbanecm board.Mon, Apr 3, 1:51 PM

Change 346157 had a related patch set uploaded (by Ema):
[operations/puppet@production] cache_upload: unset Content-Type on 304 responses

https://gerrit.wikimedia.org/r/346157

Change 346157 merged by Ema:
[operations/puppet@production] cache_upload: unset Content-Type on 304 responses

https://gerrit.wikimedia.org/r/346157

Mentioned in SAL (#wikimedia-operations) [2017-04-03T14:49:32Z] <ema> cache_upload: ban all objects with content-type: text/html T162035

Paladox added a subscriber: Paladox.Mon, Apr 3, 2:53 PM
Joe added a subscriber: Joe.Mon, Apr 3, 3:51 PM

@stjn I cannot reproduce your case and we should've fixed the largest underlying reason for such problems. Do you still experience the same issue?

No problem for me. Try to clear cache of your browser. There should be
active workaround now.

stjn added a comment.Mon, Apr 3, 4:02 PM

Yes, the same ERR_CONTENT_DECODING_FAILED even with disabled cache (via Devtools) at https://ru.wikipedia.org/wiki/Шаблон:Медаль_Партизану_Отечественной_войны_1_степени

Tested in Firefox / Chromium.

I can confirm ERR_CONTENT_DECODING_FAILED for https://upload.wikimedia.org/wikipedia/commons/thumb/e/e7/Partizan-Medal-1-ribbon.png/40px-Partizan-Medal-1-ribbon.png (Firefox 52.0.2). Other above mentioned pictures are working fine for me now, though.

ema added a subscriber: Volans.Mon, Apr 3, 6:56 PM

Our understanding of the problem so far is that some of our swift servers set the wrong Content-Type header when responding with 304 to If-Modified-Since conditional requests:

< HTTP/1.1 304 Not Modified
< Accept-Ranges: bytes
< Content-Type: text/html; charset=UTF-8
< Content-Length: 0
< Access-Control-Allow-Origin: *
< X-Trans-Id: txea1a584c36164922af4b4-0058e284a2
< Date: Mon, 03 Apr 2017 17:21:38 GMT

The right Content-Type value would have been image/png in this case. According to rfc7234, caches should "use other header fields provided in the 304 (Not Modified) response to replace all instances of the corresponding header fields in the stored response", which means that varnish is behaving correctly here, replacing Content-Type: png with Content-Type: text/html; charset=UTF-8 in the cached object.

As a temporary workaround while we come up with a solution for the swift issue, we've deployed a VCL hack to unset Content-Length on Not Modified responses. Then we've banned all objects with Content-Type: text/html; charset=UTF-8 on cache_upload and that seems to have solved most of the problems.

@Volans found that the bug is not reproducible on all our swift backends but rather depends on the version of the software: 1.13.1 is affected, while 2.2.0 works fine.

ema added a comment.Mon, Apr 3, 7:23 PM

There is probably another type of bug responsible for the ERR_CONTENT_DECODING_FAILED errors affecting https://upload.wikimedia.org/wikipedia/commons/thumb/e/e7/Partizan-Medal-1-ribbon.png/40px-Partizan-Medal-1-ribbon.png.

The issue seems due to the object being cached with Content-Encoding: gzip on a bunch of servers: cp[3035,3037-3038,3044-3046,3048-3049].esams.wmnet. I didn't manage to get a response from the applayer with CE: gzip though.

Nemo_bis changed the title from "Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser" to "Some PNG thumbnails and JPEG originals delivered as [text/html] content-type and hence not rendered in browser".Tue, Apr 4, 7:07 AM

(Fixed summary to reflect the "original" bug report, T162033.)

TheDJ added a subscriber: TheDJ.Tue, Apr 4, 11:01 AM

I don't think that the purge was complete. This one https://upload.wikimedia.org/wikipedia/commons/thumb/c/c8/Egypt_location_map.svg/300px-Egypt_location_map.svg.png still shows content-type text/html

Mentioned in SAL (#wikimedia-operations) [2017-04-04T13:00:14Z] <ema> cache_upload: ban all objects with content-type ~ "^text" T162035

Change 346304 had a related patch set uploaded (by Ema):
[operations/puppet@production] cache_upload: properly detect 304s when unsetting CT

https://gerrit.wikimedia.org/r/346304

Pokefan95 added a subscriber: Pokefan95.EditedWed, Apr 5, 5:44 AM

All files listed here works for me except https://upload.wikimedia.org/wikipedia/commons/thumb/f/f5/Flag_of_Cross_of_Burgundy.svg/23px-Flag_of_Cross_of_Burgundy.svg.png. I am using Firefox 7.0 (iOS 9), and when I tried to access the said link, it returns a "cannot decode raw data" error.

Change 346304 merged by Ema:
[operations/puppet@production] cache_upload: override CT updates on 304s

https://gerrit.wikimedia.org/r/346304

Change 346510 had a related patch set uploaded (by Ema):
[operations/puppet@production] cache_upload: lower keep from 3d to 1d on upload backends

https://gerrit.wikimedia.org/r/346510

Change 346510 merged by Ema:
[operations/puppet@production] cache_upload: lower keep from 3d to 1d on upload backends

https://gerrit.wikimedia.org/r/346510

Mentioned in SAL (#wikimedia-releng) [2017-04-05T13:34:39Z] <ema> testing possible fix for T162035 on deployment-ms-fe01

ema added a comment.Thu, Apr 6, 7:31 AM

We're currently running with a VCL patch that keeps track of the proper Content-Type returned by swift on the initial 200 response in an additional, internal header: X-Original-Content-Type. Whenever swift responds with a 304 Not Modified and the wrong CT, we re-set CT to the proper value in the cached object.

However this is not enough: we've got various objects that were cached before the patch was deployed. Those object do not know what the proper, original content-type was. Their CT will be unfortunately be overwritten as soon as swift responds with a 304 and Content-Type: text.

Our current strategy to mitigate user-facing impact is:

  • Ban all objects with CT: text once or twice a day
  • In a few days, ban all objects without X-Original-Content-Type

Mentioned in SAL (#wikimedia-operations) [2017-04-06T07:32:21Z] <ema> cache_upload: ban all objects with content-type ~ "^text" T162035

ema moved this task from Triage to Caching on the Traffic board.Thu, Apr 6, 10:14 AM

For me every day new images are broken and those broken yesterday work today.

For me every day new images are broken and those broken yesterday work today.

Indeed. Potential duplicates: T162130, T161917, T160867, T162333, T162483

BBlack added a subscriber: BBlack.Sat, Apr 8, 12:53 PM

The continued reports above were expected, as detailed when the Varnish-level workaround was applied above in T162035#3159658 . I've done another of the periodic bans this morning. After giving some time for that impact to settle, I'll start later today on executing and monitoring the more-complete ban on "all objects without X-Original-Content-Type". After that ban, we should be able to get the workaround to take complete effect with one last ban on CT ~ text/html across the fleet. Next week we'll sort out the plans for solving the underlying issue with Swift so that we can eventually revert the Varnish-level hacks and restore our storage keep-time, etc.

BBlack added a comment.Sat, Apr 8, 4:03 PM

The last round of bans mentioned above is complete now. If all of our theories and workarounds are completely valid (and there aren't other bugs or behaviors in play), this issue should be resolved now with no remaining examples (or new ones being created).

I fail to load https://upload.wikimedia.org/wikipedia/commons/thumb/3/31/Stockholm_stadshuset.jpg/1800px-Stockholm_stadshuset.jpg in Firefox 52 ("Content Encoding Error") however wget does correctly say [image/jpeg]. Hence not sure it's the same issue.

ema added a comment.Mon, Apr 10, 1:35 PM

I fail to load https://upload.wikimedia.org/wikipedia/commons/thumb/3/31/Stockholm_stadshuset.jpg/1800px-Stockholm_stadshuset.jpg in Firefox 52 ("Content Encoding Error") however wget does correctly say [image/jpeg]. Hence not sure it's the same issue.

What you're seeing is likely another problem (see T162035#3151985).

Change 348698 had a related patch set uploaded (by Ema):
[operations/puppet@production] Revert "cache_upload: lower keep from 3d to 1d on upload backends"

https://gerrit.wikimedia.org/r/348698

Change 348699 had a related patch set uploaded (by Ema):
[operations/puppet@production] Revert "cache_upload: override CT updates on 304s"

https://gerrit.wikimedia.org/r/348699

Change 348699 merged by Ema:
[operations/puppet@production] Revert "cache_upload: override CT updates on 304s"

https://gerrit.wikimedia.org/r/348699

ema closed this task as "Resolved".Thu, Apr 20, 3:25 PM
ema claimed this task.

The user-facing issue has been solved for a while by our VCL workaround + regular bans. Closing, now that the swift problem has also been fixed by upgrading to 2.2.0 T162348#3189319 and that we've reverted the VCL workaround.

The user-facing issue has been solved for a while by our VCL workaround + regular bans. Closing, now that the swift problem has also been fixed by upgrading to 2.2.0 T162348#3189319 and that we've reverted the VCL workaround.

The problem is still in place

@AlexRus Can you give an example of a file that is still broken?

@AlexRus Can you give an example of a file that is still broken?

Yes exactly. I had to clear the browser cache. Everything works as it should.

Change 348698 merged by Ema:
[operations/puppet@production] Revert "cache_upload: lower keep from 3d to 1d on upload backends"

https://gerrit.wikimedia.org/r/348698