Page MenuHomePhabricator

Some PNG thumbnails and JPEG originals delivered as [text/html] content-type and hence not rendered in browser
Closed, ResolvedPublic

Description

Hi,

please solve missing 200px size of an picture, see https://upload.wikimedia.org/wikipedia/commons/thumb/5/57/Status_iucn3.1_LC_cs.svg/200px-Status_iucn3.1_LC_cs.svg.png Other sizes works correctly, see https://upload.wikimedia.org/wikipedia/commons/thumb/5/57/Status_iucn3.1_LC_cs.svg/201px-Status_iucn3.1_LC_cs.svg.png for example.

This image is very frequented in the articles. I triaged this as unbreak now and put a notice at #wikimedia-tech therefore.

I get ERR_CONTENT_DECODING_FAILED as the error code (Chromium). Lastly when I saw this kind of error it was because the server was lying about used compression. It seems to be a problem of only one/several clusters as not all users get the error (@elukey can't reproduce it but @ema who is according to information at IRC close to me and probably using same cluster can).

See https://cs.wikipedia.org/w/index.php?title=Wikipedie:Pod_l%C3%ADpou_(technika)&oldid=14873106#Dal.C5.A1.C3.AD_problematick.C3.BD_obr.C3.A1zek for information what it shows now (just a place for the image+alt).

Tried browsers

  • Chromium 56.0.2924.76
  • Firefox 52.0.1

Other tried ways how to access an URL

  • wget 1.18 (no problem)
  • curl 7.50.1 (no problem)
  • curl 7.50.1 with spoofed user-agent (no problem)

Seems it affect only users not bots.

Martin Urbanec

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-operations) [2017-04-03T12:28:51Z] <ema> banning 200px-Status_iucn3.1_LC_cs.svg.png from esams frontends T162035

Note that the issue is pretty widespread, I'm seeing lots of requests affected by this.

It is the application itself returning the wrong Content-Type:

*   << Request  >> 389680515 
-   Begin          req 390692906 rxreq
-   Timestamp      Start: 1491225372.327099 0.000000 0.000000
-   Timestamp      Req: 1491225372.327099 0.000000 0.000000
-   ReqStart       10.64.48.105 52585
-   ReqMethod      GET
-   ReqURL         /wikipedia/commons/thumb/6/6a/1710_J._F._Reimmann%2C_Versuch_einer_Einleitung_in_die_Historiam_Literariam%2C_vol.3%2C2.png/220px-1710_J._F._Reimmann%2C_Versuch_einer_Einleitung_in_die_Historiam_Literariam%2C_vol.3%2C2.png
-   ReqProtocol    HTTP/1.1
-   ReqHeader      X-Client-IP: XXX.XXX.XXX.XXX
-   ReqHeader      X-Forwarded-Proto: https
-   ReqHeader      X-Connection-Properties: H2=1; SSR=0; SSL=TLSv1.2; C=ECDHE-ECDSA-CHACHA20-POLY1305; EC=X25519;
-   ReqHeader      user-agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Dragon/52.15.25.664 Chrome/52.0.2743.82 Safari/537.36
-   ReqHeader      accept: image/webp,image/*,*/*;q=0.8
-   ReqHeader      referer: https://de.wikipedia.org/
-   ReqHeader      accept-language: de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4
-   ReqHeader      cookie: GeoIP=XXX
-   ReqHeader      Host: upload.wikimedia.org
-   ReqHeader      X-Forwarded-For: XXX.XXX.XXX.XXX, 10.64.48.105
-   ReqHeader      via-nginx: 1
-   ReqHeader      X-WMF-LastGlobalStamp: 03-Apr-2017
-   ReqHeader      Accept-Encoding: gzip
-   ReqHeader      X-CDIS: miss
-   ReqHeader      X-Varnish: 229770791
-   ReqUnset       X-Forwarded-For: XXX.XXX.XXX.XXX, 10.64.48.105
-   ReqHeader      X-Forwarded-For: XXX.XXX.XXX.XXX, 10.64.48.105, 10.64.48.105
-   VCL_call       RECV
-   VCL_acl        MATCH wikimedia_trust "10.0.0.0"/8
-   ReqUnset       X-CDIS: miss
-   ReqHeader      X-DCPath: eqiad
-   VCL_return     hash
-   VCL_call       HASH
-   VCL_return     lookup
-   Hit            172327145
-   VCL_call       HIT
-   ReqHeader      X-CDIS: hit
-   VCL_return     miss
-   VCL_call       MISS
-   ReqUnset       X-CDIS: hit
-   ReqHeader      X-CDIS: miss
-   VCL_return     fetch
-   Link           bereq 389680516 fetch
-   Timestamp      Fetch: 1491225372.412188 0.085090 0.085090
-   RespProtocol   HTTP/1.1
-   RespStatus     200
-   RespReason     OK
-   RespHeader     Content-Type: text/html; charset=UTF-8
-   RespHeader     Access-Control-Allow-Origin: *
-   RespHeader     X-Trans-Id: txbb40584c2ed44a4ca5699-0058e24b1c
-   RespHeader     Date: Mon, 03 Apr 2017 13:16:12 GMT
-   RespHeader     Content-Length: 34591
-   RespHeader     Last-Modified: Fri, 25 Oct 2013 04:52:03 GMT
-   RespHeader     Etag: 5dcdfb14ebace391a4ab7eb006dcee37
-   RespHeader     X-Timestamp: 1382676722.00947
-   RespHeader     X-MediaWiki-Original: /wikipedia/commons/6/6a/1710_J._F._Reimmann%2C_Versuch
-   RespHeader     X-Varnish: 389680515
-   RespHeader     Age: 0
-   RespHeader     Via: 1.1 varnish-v4
-   VCL_call       DELIVER
-   RespHeader     X-Cache-Int: cp1073 miss
-   VCL_return     deliver
-   Timestamp      Process: 1491225372.412209 0.085110 0.000021
-   RespHeader     Accept-Ranges: bytes
-   Debug          "RES_MODE 2"
-   RespHeader     Connection: keep-alive
-   Timestamp      Resp: 1491225372.474162 0.147063 0.061953
-   ReqAcct        921 0 921 579 34591 35170
-   End

Change 346157 had a related patch set uploaded (by Ema):
[operations/puppet@production] cache_upload: unset Content-Type on 304 responses

https://gerrit.wikimedia.org/r/346157

Change 346157 merged by Ema:
[operations/puppet@production] cache_upload: unset Content-Type on 304 responses

https://gerrit.wikimedia.org/r/346157

Mentioned in SAL (#wikimedia-operations) [2017-04-03T14:49:32Z] <ema> cache_upload: ban all objects with content-type: text/html T162035

@stjn I cannot reproduce your case and we should've fixed the largest underlying reason for such problems. Do you still experience the same issue?

No problem for me. Try to clear cache of your browser. There should be
active workaround now.

Yes, the same ERR_CONTENT_DECODING_FAILED even with disabled cache (via Devtools) at https://ru.wikipedia.org/wiki/Шаблон:Медаль_Партизану_Отечественной_войны_1_степени

Tested in Firefox / Chromium.

I can confirm ERR_CONTENT_DECODING_FAILED for https://upload.wikimedia.org/wikipedia/commons/thumb/e/e7/Partizan-Medal-1-ribbon.png/40px-Partizan-Medal-1-ribbon.png (Firefox 52.0.2). Other above mentioned pictures are working fine for me now, though.

Our understanding of the problem so far is that some of our swift servers set the wrong Content-Type header when responding with 304 to If-Modified-Since conditional requests:

< HTTP/1.1 304 Not Modified
< Accept-Ranges: bytes
< Content-Type: text/html; charset=UTF-8
< Content-Length: 0
< Access-Control-Allow-Origin: *
< X-Trans-Id: txea1a584c36164922af4b4-0058e284a2
< Date: Mon, 03 Apr 2017 17:21:38 GMT

The right Content-Type value would have been image/png in this case. According to rfc7234, caches should "use other header fields provided in the 304 (Not Modified) response to replace all instances of the corresponding header fields in the stored response", which means that varnish is behaving correctly here, replacing Content-Type: png with Content-Type: text/html; charset=UTF-8 in the cached object.

As a temporary workaround while we come up with a solution for the swift issue, we've deployed a VCL hack to unset Content-Length on Not Modified responses. Then we've banned all objects with Content-Type: text/html; charset=UTF-8 on cache_upload and that seems to have solved most of the problems.

@Volans found that the bug is not reproducible on all our swift backends but rather depends on the version of the software: 1.13.1 is affected, while 2.2.0 works fine.

There is probably another type of bug responsible for the ERR_CONTENT_DECODING_FAILED errors affecting https://upload.wikimedia.org/wikipedia/commons/thumb/e/e7/Partizan-Medal-1-ribbon.png/40px-Partizan-Medal-1-ribbon.png.

The issue seems due to the object being cached with Content-Encoding: gzip on a bunch of servers: cp[3035,3037-3038,3044-3046,3048-3049].esams.wmnet. I didn't manage to get a response from the applayer with CE: gzip though.

Nemo_bis renamed this task from Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser to Some PNG thumbnails and JPEG originals delivered as [text/html] content-type and hence not rendered in browser.Apr 4 2017, 7:07 AM

(Fixed summary to reflect the "original" bug report, T162033.)

Mentioned in SAL (#wikimedia-operations) [2017-04-04T13:00:14Z] <ema> cache_upload: ban all objects with content-type ~ "^text" T162035

Change 346304 had a related patch set uploaded (by Ema):
[operations/puppet@production] cache_upload: properly detect 304s when unsetting CT

https://gerrit.wikimedia.org/r/346304

All files listed here works for me except https://upload.wikimedia.org/wikipedia/commons/thumb/f/f5/Flag_of_Cross_of_Burgundy.svg/23px-Flag_of_Cross_of_Burgundy.svg.png. I am using Firefox 7.0 (iOS 9), and when I tried to access the said link, it returns a "cannot decode raw data" error.

Change 346304 merged by Ema:
[operations/puppet@production] cache_upload: override CT updates on 304s

https://gerrit.wikimedia.org/r/346304

Change 346510 had a related patch set uploaded (by Ema):
[operations/puppet@production] cache_upload: lower keep from 3d to 1d on upload backends

https://gerrit.wikimedia.org/r/346510

Change 346510 merged by Ema:
[operations/puppet@production] cache_upload: lower keep from 3d to 1d on upload backends

https://gerrit.wikimedia.org/r/346510

Mentioned in SAL (#wikimedia-releng) [2017-04-05T13:34:39Z] <ema> testing possible fix for T162035 on deployment-ms-fe01

We're currently running with a VCL patch that keeps track of the proper Content-Type returned by swift on the initial 200 response in an additional, internal header: X-Original-Content-Type. Whenever swift responds with a 304 Not Modified and the wrong CT, we re-set CT to the proper value in the cached object.

However this is not enough: we've got various objects that were cached before the patch was deployed. Those object do not know what the proper, original content-type was. Their CT will be unfortunately be overwritten as soon as swift responds with a 304 and Content-Type: text.

Our current strategy to mitigate user-facing impact is:

  • Ban all objects with CT: text once or twice a day
  • In a few days, ban all objects without X-Original-Content-Type

Mentioned in SAL (#wikimedia-operations) [2017-04-06T07:32:21Z] <ema> cache_upload: ban all objects with content-type ~ "^text" T162035

For me every day new images are broken and those broken yesterday work today.

For me every day new images are broken and those broken yesterday work today.

Indeed. Potential duplicates: T162130, T161917, T160867, T162333, T162483

The continued reports above were expected, as detailed when the Varnish-level workaround was applied above in T162035#3159658 . I've done another of the periodic bans this morning. After giving some time for that impact to settle, I'll start later today on executing and monitoring the more-complete ban on "all objects without X-Original-Content-Type". After that ban, we should be able to get the workaround to take complete effect with one last ban on CT ~ text/html across the fleet. Next week we'll sort out the plans for solving the underlying issue with Swift so that we can eventually revert the Varnish-level hacks and restore our storage keep-time, etc.

The last round of bans mentioned above is complete now. If all of our theories and workarounds are completely valid (and there aren't other bugs or behaviors in play), this issue should be resolved now with no remaining examples (or new ones being created).

I fail to load https://upload.wikimedia.org/wikipedia/commons/thumb/3/31/Stockholm_stadshuset.jpg/1800px-Stockholm_stadshuset.jpg in Firefox 52 ("Content Encoding Error") however wget does correctly say [image/jpeg]. Hence not sure it's the same issue.

I fail to load https://upload.wikimedia.org/wikipedia/commons/thumb/3/31/Stockholm_stadshuset.jpg/1800px-Stockholm_stadshuset.jpg in Firefox 52 ("Content Encoding Error") however wget does correctly say [image/jpeg]. Hence not sure it's the same issue.

What you're seeing is likely another problem (see T162035#3151985).

Change 348698 had a related patch set uploaded (by Ema):
[operations/puppet@production] Revert "cache_upload: lower keep from 3d to 1d on upload backends"

https://gerrit.wikimedia.org/r/348698

Change 348699 had a related patch set uploaded (by Ema):
[operations/puppet@production] Revert "cache_upload: override CT updates on 304s"

https://gerrit.wikimedia.org/r/348699

Change 348699 merged by Ema:
[operations/puppet@production] Revert "cache_upload: override CT updates on 304s"

https://gerrit.wikimedia.org/r/348699

ema claimed this task.

The user-facing issue has been solved for a while by our VCL workaround + regular bans. Closing, now that the swift problem has also been fixed by upgrading to 2.2.0 T162348#3189319 and that we've reverted the VCL workaround.

The user-facing issue has been solved for a while by our VCL workaround + regular bans. Closing, now that the swift problem has also been fixed by upgrading to 2.2.0 T162348#3189319 and that we've reverted the VCL workaround.

The problem is still in place

@AlexRus Can you give an example of a file that is still broken?

@AlexRus Can you give an example of a file that is still broken?

Yes exactly. I had to clear the browser cache. Everything works as it should.

Change 348698 merged by Ema:
[operations/puppet@production] Revert "cache_upload: lower keep from 3d to 1d on upload backends"

https://gerrit.wikimedia.org/r/348698

Is this really fixed? I still get strange Content-Type for the thumbnail from description (however it is not [text/html] already:

$ wget -S https://upload.wikimedia.org/wikipedia/commons/thumb/5/57/Status_iucn3.1_LC_cs.svg/200px-Status_iucn3.1_LC_cs.svg.png 2>&1 | grep Content-Type
  Content-Type: application/x-www-form-urlencoded

Full headers:

HTTP/1.1 200 OK
Date: Wed, 15 May 2019 07:04:15 GMT
Content-Type: application/x-www-form-urlencoded
Content-Length: 8100
Connection: keep-alive
Etag: b485920910bc973c3fad353a9b809944
Server: ATS/8.0.3
X-Object-Meta-Sha1Base36: s73fklaf49dfygd9oug7c9kbjy5wu6h
Last-Modified: Mon, 03 Apr 2017 12:50:01 GMT
X-Timestamp: 1491223800.69668
X-Trans-Id: txc11c43e2ea804120a6f6f-005cd50dde
X-Varnish: 265855185 200635104, 101450013, 151146220 113949253
Via: 1.1 varnish (Varnish/5.1), 1.1 varnish (Varnish/5.1), 1.1 varnish (Varnish/5.1)
Age: 8574
X-Cache: cp3038 hit, cp3038 hit/77
X-Cache-Status: hit-front
Server-Timing: cache;desc="hit-front"
Strict-Transport-Security: max-age=106384710; includeSubDomains; preload
X-Analytics: https=1;nocookies=1
X-Client-IP: 2a00:6d47:10:b95:dad:beef:baba:dead
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache, X-Varnish
Timing-Allow-Origin: *
Accept-Ranges: bytes

May be a separate issue? There are a few tasks that mention 'application/x-www-form-urlencoded': T188831 T190701 T191306 (are these all duplicates of each other?)

The issue is indeed reproducible again, affecting ATS hosts.

Swift is still returning the wrong Content-Type:

$ curl -v -H "If-Modified-Since: Mon, 03 Apr 2017 17:21:38 GMT" -H "Host: upload.wikimedia.org" http://swift.discovery.wmnet/wikipedia/commons/thumb/5/57/Status_iucn3.1_LC_cs.svg/200px-Status_iucn3.1_LC_cs.svg.png
< HTTP/1.1 304 Not Modified
< Content-Type: application/x-www-form-urlencoded

And we forgot to include the workaround in ATS (unsetting Content-Type on 304 responses).

Actually no, we did fix the issue at the Swift layer (T162348), hence we removed the workaround from Varnish: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/348699/. That means that there is nothing wrong with ATS.

The problem here has nothing to do with conditional requests. The Content-Type is indeed "application/x-www-form-urlencoded" in Swift:

$ curl -v -H "Host: upload.wikimedia.org" http://swift.discovery.wmnet/wikipedia/commons/thumb/5/57/Status_iucn3.1_LC_cs.svg/200px-Status_iucn3.1_LC_cs.svg.png -o /dev/null
< HTTP/1.1 200 OK
< Content-Type: application/x-www-form-urlencoded

Other data points, the 250px thumb has the correct c-t (image/png) although that thumb reports last-modified in 2014, as opposed to the 200px version in 2017

$ wget -S https://upload.wikimedia.org/wikipedia/commons/thumb/5/57/Status_iucn3.1_LC_cs.svg/250px-Status_iucn3.1_LC_cs.svg.png 
--2019-05-15 12:40:55--  https://upload.wikimedia.org/wikipedia/commons/thumb/5/57/Status_iucn3.1_LC_cs.svg/250px-Status_iucn3.1_LC_cs.svg.png
Resolving upload.wikimedia.org (upload.wikimedia.org)... 91.198.174.208, 2620:0:862:ed1a::2:b
Connecting to upload.wikimedia.org (upload.wikimedia.org)|91.198.174.208|:443... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Date: Wed, 15 May 2019 10:40:56 GMT
  Content-Type: image/png
  Content-Length: 11243
  Connection: keep-alive
  X-Object-Meta-Sha1Base36: byiq0mmhzj2toixc2gjcsbdano1iwti
  Content-Disposition: inline;filename*=UTF-8''Status_iucn3.1_LC_cs.svg.png
  Last-Modified: Thu, 09 Oct 2014 01:57:58 GMT
  Etag: 94812381213e2814e6b30d05347aa65e
  X-Timestamp: 1412819877.36190
  X-Trans-Id: txd5d4e068002c4ed99aace-005cdbebc4
  X-Varnish: 171908316, 274641796, 579875391 567344254
  Via: 1.1 varnish (Varnish/5.1), 1.1 varnish (Varnish/5.1), 1.1 varnish (Varnish/5.1)
  Age: 243
  X-Cache: cp1076 pass, cp3043 miss, cp3045 hit/1
  X-Cache-Status: hit-front
  Server-Timing: cache;desc="hit-front"
  Strict-Transport-Security: max-age=106384710; includeSubDomains; preload
  X-Analytics: https=1;nocookies=1
  X-Client-IP: 79.157.148.92
  Access-Control-Allow-Origin: *
  Access-Control-Expose-Headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache, X-Varnish
  Timing-Allow-Origin: *
  Accept-Ranges: bytes
Length: 11243 (11K) [image/png]
Saving to: ‘250px-Status_iucn3.1_LC_cs.svg.png.1’

250px-Status_iucn3.1_LC_cs.s 100%[==============================================>]  10.98K  --.-KB/s    in 0.001s  

2019-05-15 12:40:55 (15.0 MB/s) - ‘250px-Status_iucn3.1_LC_cs.svg.png.1’ saved [11243/11243]

See T188831 for the application/x-www-form-urlencoded variation of this.

Change 530338 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] VCL: workaround for images delivered with CT:x-www-form-urlencoded

https://gerrit.wikimedia.org/r/530338

Change 530338 merged by Ema:
[operations/puppet@production] VCL: workaround for images delivered with CT:x-www-form-urlencoded

https://gerrit.wikimedia.org/r/530338

Change 530338 merged by Ema:
[operations/puppet@production] VCL: workaround for images delivered with CT:x-www-form-urlencoded

https://gerrit.wikimedia.org/r/530338

This fixed the issue but I forgot to close the task when the patch got merged. Closing now!