Page MenuHomePhabricator

Some thumbnail images delivered with wrong application/x-www-form-urlencoded mime-type
Open, HighPublic

Description

This happens on various thumbnail sizes of https://www.mediawiki.org/wiki/File:Page_Schemas_edit_schema_screenshot.png (not all of them).

Clicking on some of them, the browser prompts to download the file instead of displaying it. The cause is them being delivered with a application/x-www-form-urlencoded content-type instead of image/png

$ curl -s -D - -o /dev/null "https://upload.wikimedia.org/wikipedia/mediawiki/4/43/Page_Schemas_edit_schema_screenshot.png"
HTTP/1.1 200 OK
Date: Sat, 03 Mar 2018 17:15:19 GMT
Content-Type: application/x-www-form-urlencoded
Content-Length: 59510
Connection: keep-alive
X-Object-Meta-Sha1Base36: 3lunouryyip6gpflvztb2p20l0nfs5k
Last-Modified: Tue, 30 May 2017 11:07:47 GMT
Etag: 5db3c2a1fee3897a8325c76b7d46ecd9
X-Timestamp: 1496142466.35998
X-Content-Dimensions: 1206x1349:1
X-Trans-Id: tx1ffbc88de8d04285bddbe-005a9ad71b
X-Varnish: 721112451, 61187040 65427943, 466571330
Via: 1.1 varnish (Varnish/5.1), 1.1 varnish (Varnish/5.1), 1.1 varnish (Varnish/5.1)
Accept-Ranges: bytes
Age: 268
X-Cache: cp1073 pass, cp3034 hit/2, cp3039 miss
X-Cache-Status: hit-local
Strict-Transport-Security: max-age=106384710; includeSubDomains; preload
X-Analytics: https=1;nocookies=1
X-Client-IP: 83.39.35.95
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache, X-Varnish
Timing-Allow-Origin: *

Event Timeline

Ramsey-WMF added subscribers: Cparle, Ramsey-WMF.

Temporarily assigning to @Cparle since he resolved T173276 , which may be the same issue.

Fixed for the image in the ticket description, but not for the Swedish flag ...

Ramsey-WMF moved this task from Untriaged to Next up on the Multimedia board.

I've moved the thumbnails problem you raised @Peter into T190701 - it's a bit different cos the file mentioned in this ticket it seems that it was the original that was stored incorrectly, where in your one it's the thumbnails ... and in any case the repair script that works for the file raised in this ticket doesn't work for the Swedish flag

@Ciencia_Al_Poder can you confirm that the file you raised this ticket about works ok now?

@Ciencia_Al_Poder can you confirm that the file you raised this ticket about works ok now?

It works now, thanks! However, clicking on the first version of that file gives the same problem. A minor thing I guess

I raised a separate ticket for the revisions, see T191306

Ok to close this one?

I came across another one: https://www.mediawiki.org/wiki/File:Mscatselect_1.jpg

The current image (original size) has the same problem. content-type: application/x-www-form-urlencoded

https://upload.wikimedia.org/wikipedia/mediawiki/a/a1/Mscatselect_1.jpg

Time to run the script again, I guess

Aklapper renamed this task from Some images delivered with wrong application/x-www-form-urlencoded mime-type to Some thumbnail images delivered with wrong application/x-www-form-urlencoded mime-type.Jul 31 2019, 1:11 AM
Aklapper removed Cparle as the assignee of this task.
Aklapper edited projects, added Thumbor; removed Multimedia-Team-Working-Board.
Aklapper added subscribers: Vort, EdJoPaTo, MBH, IKhitron.

This thumb same problem: https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/Shin-marunouchi.Building-2007-01.jpg/100px-Shin-marunouchi.Building-2007-01.jpg

curl -s -D - -o /dev/null https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/Shin-marunouchi.Building-2007-01.jpg/100px-Shin-marunouchi.Building-2007-01.jpg
HTTP/2 200
date: Thu, 01 Aug 2019 03:06:57 GMT
content-type: application/x-www-form-urlencoded
content-length: 5017
x-object-meta-sha1base36: ml1bi0n9rwgybi6gge8be7ggo6no1mb
last-modified: Wed, 09 Mar 2016 03:01:45 GMT
x-timestamp: 1457492504.86118
x-trans-id: tx00b1bb24df2d41d794b96-005d4209f1
etag: 8d255785cf5f17a60578d9accec77f2a
server: ATS/8.0.3
x-varnish: 478659828 365867875
age: 19808
x-cache: cp1076 hit, cp1076 hit/3
x-cache-status: hit-front
server-timing: cache;desc="hit-front"
strict-transport-security: max-age=106384710; includeSubDomains; preload
x-analytics: https=1;nocookies=1
x-client-ip: 172.16.7.167
access-control-allow-origin: *
access-control-expose-headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache, X-Varnish
timing-allow-origin: *
accept-ranges: bytes

This bug caused some images to be blocked by Cross-Origin Read Blocking (CORB) in Chrome 76 - see https://bugs.chromium.org/p/chromium/issues/detail?id=990853#c2

How easy or difficult would it be to fix wikimedia to send the correct Content-Type response header for images (e.g. image/png rather than application/x-www-form-urlencoded) ?

FWIW, in my previous comment I was hoping for a systematic fix (rather than having wikimedia editors have to fix images one-by-one). Do we know why some images are served with this weird application/x-www-form-urlencoded content type? I would normally associate application/x-www-form-urlencoded with http POST *requests* rather than with http *responses*...

ema raised the priority of this task from Low to High.Aug 10 2019, 9:31 AM
ema subscribed.

Priority set to High as images are not displayed correctly due to this. I see the bug happening right now on https://upload.wikimedia.org/wikipedia/commons/thumb/c/cb/Logo_European_Central_Bank.svg/150px-Logo_European_Central_Bank.svg.png

< HTTP/2 200 
< date: Sat, 10 Aug 2019 09:27:13 GMT
< content-type: application/x-www-form-urlencoded
< content-length: 7145
< x-object-meta-sha1base36: 76at8zaf4c2ioc8j1jehlxl8med6u6j
< last-modified: Mon, 09 Oct 2017 15:19:26 GMT
< etag: ab7bd990981ebb7b3c8faa7ba460ace4
< x-timestamp: 1507562365.00999
< x-trans-id: tx9fd74dae875c482396356-005d4db8b5
< server: ATS/8.0.3
< x-varnish: 179129756 179942128
< age: 54587
< x-cache: cp3047 hit, cp3038 hit/8
< x-cache-status: hit-front
< server-timing: cache;desc="hit-front"
< strict-transport-security: max-age=106384710; includeSubDomains; preload
< x-analytics: https=1;nocookies=1
< x-client-ip: 89.14.184.242
< access-control-allow-origin: *
< access-control-expose-headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache, X-Varnish
< timing-allow-origin: *
< accept-ranges: bytes

Screenshot from 2019-08-10 11-29-29.png (1×2 px, 640 KB)

Hi, problems are shown when requests are sent as follows:

URL: https://upload.wikimedia.org/wikipedia/commons/thumb/f/fa/Flag_of_the_People%27s_Republic_of_China.svg/120px-Flag_of_the_People%27s_Republic_of_China.svg.png

Response:

accept-ranges: bytes
access-control-allow-origin: *
access-control-expose-headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache, X-Varnish
age: 82000
content-length: 1053
content-type: application/x-www-form-urlencoded
date: Wed, 14 Aug 2019 14:03:36 GMT
etag: 80084a88fb8ce92d3bb8b4e95464e165
last-modified: Sun, 22 Jul 2018 06:42:06 GMT
server: ATS/8.0.3
server-timing: cache;desc="hit-front"
status: 200
strict-transport-security: max-age=106384710; includeSubDomains; preload
timing-allow-origin: *
x-analytics: WMF-Last-Access=25-Jul-2019;https=1
x-cache: cp2008 hit, cp2018 hit/76
x-cache-status: hit-front
x-client-ip: 45.35.251.212
x-object-meta-sha1base36: djawj4omfiqzf94c2lse08opy52vnnm
x-timestamp: 1532241725.64215
x-trans-id: txd4435aa1711f4505964a7-005d2f0ffa
x-varnish: 876628493 843349676

Request

:authority: upload.wikimedia.org
:method: GET
:path: /wikipedia/commons/thumb/f/fa/Flag_of_the_People%27s_Republic_of_China.svg/120px-Flag_of_the_People%27s_Republic_of_China.svg.png
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
accept-encoding: gzip, deflate, br
accept-language: zh-CN,zh;q=0.9,zh-TW;q=0.8,en;q=0.7,ja;q=0.6
cache-control: no-cache
cookie: WMF-Last-Access=25-Jul-2019; GeoIP=<Removed>; WMF-Last-Access-Global=14-Aug-2019
dnt: 1
pragma: no-cache
sec-fetch-mode: navigate
sec-fetch-site: none
upgrade-insecure-requests: 1
user-agent: <Removed>

@aaron @brion given nobody on our team has much expertise with the areas this bug may touch, I wondered if you might chime in with any thoughts?

Hmm, I notice this in SwiftFileBackend.php:

	/**
	 * Sanitize and filter the custom headers from a $params array.
	 * Only allows certain "standard" Content- and X-Content- headers.
	 *
	 * When POSTing data, libcurl adds Content-Type: application/x-www-form-urlencoded
	 * if Content-Type is not set, which overwrites the stored Content-Type header
	 * in Swift - therefore for POSTing data do not strip the Content-Type header (the
	 * previously-stored header that has been already read back from swift is sent)
	 *
	 * @param array $params
	 * @return array Sanitized value of 'headers' field in $params
	 */
	protected function sanitizeHdrs( array $params ) {
		return isset( $params['headers'] )
			? $this->getCustomHeaders( $params['headers'] )
			: [];
	}

Could the correct Content-Type be missing for some reason sometimes?

The only POSTs I see in there are in setContainerAccess, addMissingHashMetadata, and doDescribeInternal, which all update existing files. Could be there's a hole in the logic of one of them that's dropping Content-Type, or that some files were missing it to begin with but it was being filled in on high-level fetches, or something, so they worked until something updated the data?

Change 530338 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] VCL: workaround for images delivered with CT:x-www-form-urlencoded

https://gerrit.wikimedia.org/r/530338

Change 530338 merged by Ema:
[operations/puppet@production] VCL: workaround for images delivered with CT:x-www-form-urlencoded

https://gerrit.wikimedia.org/r/530338

@Ciencia_Al_Poder, @Wang_Qiliang: I have added a workaround at the CDN level which replaces the wrong Content-Type based on file extension. Can you please check if the issue is still reproducible on your side?

Issue solved for the examples provided

@Wang_Qiliang I don't see application/x-www-form-urlencoded there.
The only noticable thing is that .png is downloaded as image/webp.

It returns content-type: image/png here (using both curl and browser).

In T188831#5414614, @brion wrote:

The only POSTs I see in there are in setContainerAccess, addMissingHashMetadata, and doDescribeInternal, which all update existing files. Could be there's a hole in the logic of one of them that's dropping Content-Type, or that some files were missing it to begin with but it was being filled in on high-level fetches, or something, so they worked until something updated the data?

I think that's likely what's happening yes.

From swift's perspective here's the status as I understand it:

  • we're running with default post_as_copy = true option (default for 2.10 which we're running, changed to false in swift 2.13) thus POST does allow changing c-t at the cost of incurring in a full object copy. post_as_copy defaults to false in 2.13 because the previous buggy behavior has been fixed, thus POST'ing to change c-t in an existing object does not incur into a COPY and does the right thing. See also https://specs.openstack.org/openstack/swift-specs/specs/in_progress/fastpostupdates.html and https://wiki.openstack.org/wiki/Swift/FastPost
  • third party folks might be running with post_as_copy = false and swift releases < 2.13, and we should be mindful of those and provide instructions/documentation as needed when changing swiftfilebackend, see also two comments from @aaron at https://phabricator.wikimedia.org/T178849#3768032
  • Thumbor also uploads to swift nowadays, although TTBOMK content-type isn't changed and gets copied from the original at thumbnail generation time
  • swift has custom middleware to e.g. send thumbnail requests to the inactive datacenter, though that shouldn't be involved in changing c-t

I won't have a lot of time to further dig into this but happy to help as I can and field questions!

BBlack subscribed.

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all such tickets that haven't been updated in 6 months or more. This does not imply any human judgement about the validity or importance of the task, and is simply the first step in a larger task cleanup effort. Further manual triage and/or requests for updates will happen this month for all such tickets. For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!