Page MenuHomePhabricator

Problem loading thumbnail images due to Envoy (HTTP/1.0 clients getting '426 Upgrade Required')
Closed, ResolvedPublicBUG REPORT

Description

My mediawiki 1.31.1 installtion has a problem loading thumbnail images, which started today.

What happens?:

Instead of thumbnails, it shows a box with " Error creating thumbnail: "
in mwdebug.txt I see such messages
ForeignAPIRepo: HTTP GET: https://upload.wikimedia.org/wikipedia/commons/thumb/c/cc/Timoth%C3%A9e_Chalamet_in_2018_%28cropped%29.jpg/180px-Timoth%C3%A9e_Chalamet_in_2018_%28cropped%29.jpg
[http] There was a problem during the HTTP request: 426 Upgrade Required
ForeignAPIRepo::getThumbUrlFromCache Could not download thumb

I am aware that 1.31.1 is a LTS version for which support expired.
But I cannot upgrade so quickly.
Is the a patch I could apply so that thumbnail requests for images work again?

What should have happened instead?:

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc.:

Event Timeline

Don't let the text mislead you, 426 Upgrade Required refers to HTTPS, not the version of MediaWiki (regardless updating is a good idea, but I doubt it will fix this issue). Looking at https://wikitech.wikimedia.org/wiki/HTTPS/Browser_Recommendations, TLS version 1.2 or higher should be used when accessing wikimedia websites. That suggests that the server you're running MediaWiki on doesn't use TLS version 1.2 or higher. The page suggests that this is an operating system issue. What operating system (and web server) are you using?

Hi @Aloist, thanks for taking the time to report this! As you wrote, you are using a version which is too old and not supported anymore. Please bring this up on https://www.mediawiki.org/wiki/Project:Support_desk instead. Thanks.

It is possible the problem is related to php function file_get_contents which defaults to http/1.0 protocl_version.

I have this in file mwdebug.txt
ForeignAPIRepo: HTTP GET: https://commons.wikimedia.org/w/api.php?titles=File%3ABundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg&iiprop=timestamp%7Cuser%7Ccomment%7Curl%7Csize%7Csha1%7Cmetadata%7Cmime%7Cmediatype%7Cextmetadata&prop=imageinfo&iimetadataversion=2&iiextmetadatamultilang=1&format=json&action=query&redirects=true&uselang=en
ForeignAPIRepo: HTTP GET: https://commons.wikimedia.org/w/api.php?titles=File%3ABundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg&iiprop=url%7Ctimestamp&iiurlwidth=180&iiurlheight=270&iiurlparam=180px&prop=imageinfo&format=json&action=query&redirects=true&uselang=en
ForeignAPIRepo::getThumbUrl got remote thumb https://upload.wikimedia.org/wikipedia/commons/thumb/8/86/Bundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg/180px-Bundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg
ForeignAPIRepo::getThumbUrlFromCache Thumbnail was already downloaded before
ForeignAPIRepo: HTTP GET: https://upload.wikimedia.org/wikipedia/commons/thumb/8/86/Bundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg/180px-Bundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg

[http] There was a problem during the HTTP request: 426 Upgrade Required

ForeignAPIRepo::getThumbUrlFromCache Could not download thumb
ForeignAPIRepo: HTTP GET: https://commons.wikimedia.org/w/api.php?titles=File%3ABundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg&iiprop=url%7Ctimestamp&iiurlwidth=180&iiurlheight=270&iiurlparam=180px&prop=imageinfo&uselang=en&format=json&action=query&redirects=true

Is there a way to debug deeper, so that I can log exactly which php functions are called?

It seems that it is possible to add a context to the call of file_get_contents, so that it overrides the default pritocol version.
But I woukd need to know where exactly in Mediawiki code I need to make this change.
The php version is 7.2.34. An upgrade to 7.3.33 did not solve the issue, so far.

Using CURL I had no problem requesting those files nor api call with HTTP1.0 protocol:

1user@host:~> curl -s -D - -o /dev/null --http1.0 'https://upload.wikimedia.org/wikipedia/commons/thumb/8/86/Bundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg/180px-Bundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg'
2HTTP/1.0 200 OK
3Date: Sat, 29 Jan 2022 10:36:14 GMT
4Content-Type: image/jpeg
5Content-Length: 12203
6Last-Modified: Sat, 12 Aug 2017 06:05:00 GMT
7Etag: 2ca16108f326f7916e09bb928e5096ce
8X-Timestamp: 1502517899.13806
9Server: ATS/8.0.8
10Age: 53
11X-Cache: cp3063 miss, cp3059 hit/2
12X-Cache-Status: hit-front
13Server-Timing: cache;desc="hit-front", host;desc="cp3059"
14Strict-Transport-Security: max-age=106384710; includeSubDomains; preload
15Report-To: { "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
16NEL: { "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}
17Permissions-Policy: interest-cohort=()
18Access-Control-Allow-Origin: *
19Access-Control-Expose-Headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache
20Timing-Allow-Origin: *
21Accept-Ranges: bytes
22
23user@host:~> curl -s -D - -o /dev/null --http1.0 'https://commons.wikimedia.org/w/api.php?titles=File%3ABundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg&iiprop=url%7Ctimestamp&iiurlwidth=180&iiurlheight=270&iiurlparam=180px&prop=imageinfo&uselang=en&format=json&action=query&redirects=true'
24HTTP/1.0 200 OK
25Date: Sat, 29 Jan 2022 10:37:53 GMT
26Server: mw1396.eqiad.wmnet
27X-Content-Type-Options: nosniff
28P3p: CP="See https://commons.wikimedia.org/wiki/Special:CentralAutoLogin/P3P for more info."
29X-Frame-Options: DENY
30Content-Disposition: inline; filename=api-result.json
31Vary: Accept-Encoding,Treat-as-Untrusted,X-Forwarded-Proto,Cookie,Authorization
32Cache-Control: private, must-revalidate, max-age=0
33Content-Type: application/json; charset=utf-8
34Age: 0
35X-Cache: cp3054 miss, cp3050 pass
36X-Cache-Status: pass
37Server-Timing: cache;desc="pass", host;desc="cp3050"
38Strict-Transport-Security: max-age=106384710; includeSubDomains; preload
39Report-To: { "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
40NEL: { "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}
41Permissions-Policy: interest-cohort=()
42Set-Cookie: WMF-Last-Access=29-Jan-2022;Path=/;HttpOnly;secure;Expires=Wed, 02 Mar 2022 00:00:00 GMT
43Accept-Ranges: bytes

I have Mediawiki installations on two servers, the test server and the production server.

On the test server, which makes very few requests, the requests work.

on the production server (IP xx.242 and xx.241) they do not work.

It is possible that it is blacklisted?

On my test server, I get:

curl -s -D - -o /dev/null --http1.0 'https://upload.wikimedia.org/wikipedia/commons/thumb/8/86/Bundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg/180px-Bundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg'
HTTP/1.0 200 OK
Date: Sat, 29 Jan 2022 10:36:14 GMT
Content-Type: image/jpeg
Content-Length: 12203
Last-Modified: Sat, 12 Aug 2017 06:05:00 GMT
Etag: 2ca16108f326f7916e09bb928e5096ce
X-Timestamp: 1502517899.13806
Server: ATS/8.0.8
Age: 1689
X-Cache: cp3063 hit, cp3061 hit/1
X-Cache-Status: hit-front
Server-Timing: cache;desc="hit-front", host;desc="cp3061"
Strict-Transport-Security: max-age=106384710; includeSubDomains; preload
Report-To: { "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
NEL: { "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}
Permissions-Policy: interest-cohort=()
X-Client-IP: xxc
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache
Timing-Allow-Origin: *
Accept-Ranges: bytes

On the production server, I get
curl -s -D - -o /dev/null --http1.0 'https://upload.wikimedia.org/wikipedia/commons/thumb/8/86/Bundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg/180px-Bundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg'
HTTP/1.1 426 Upgrade Required
date: Sat, 29 Jan 2022 11:04:33 GMT
server: envoy
connection: close
content-length: 0

On my Mac at home, where I am now, the request also works.

When I drop the --http1.0 option from the request, it works also from my production server:

curl -s -D - -o /dev/null 'https://upload.wikimedia.org/wikipedia/commons/thumb/8/86/Bundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg/180px-Bundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg'
HTTP/1.1 200 OK
date: Sat, 29 Jan 2022 10:36:14 GMT
content-type: image/jpeg
content-length: 12203
last-modified: Sat, 12 Aug 2017 06:05:00 GMT
etag: 2ca16108f326f7916e09bb928e5096ce
server: ATS/8.0.8
age: 2415
x-cache: cp3063 hit, cp3063 hit/1
x-cache-status: hit-front
server-timing: cache;desc="hit-front", host;desc="cp3063"
strict-transport-security: max-age=106384710; includeSubDomains; preload
report-to: { "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
nel: { "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}
permissions-policy: interest-cohort=()
x-client-ip: xx.241
access-control-allow-origin: *
access-control-expose-headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache
timing-allow-origin: *
accept-ranges: bytes

So I come back to the question: How can I change the protocol in Mediawiki software, so it no longer uses http/1.0 ?

That's a problem with your proxy software, which forces you to upgrade to HTTP/1.1.

Notice the server: envoy line from your headers when using HTTP/1.0

Aloist renamed this task from /upload.wikimedia.org replies There was a problem during the HTTP request: 426 Upgrade Required to Probem with WMF Envoy proxy. was: /upload.wikimedia.org replies There was a problem during the HTTP request: 426 Upgrade Required.Jan 29 2022, 12:07 PM

I think I solved it.

I made these changes in Mediawiki 1.31.1 software

file includes/http/Http.php

/* static public $httpEngine = false; */
 static public $httpEngine = 'curl';

file includes/http/CurlHttpRequest.php

 /* $this->curlOptions[CURLOPT_HTTP_VERSION] = CURL_HTTP_VERSION_1_0; */
$this->curlOptions[CURLOPT_HTTP_VERSION] = CURL_HTTP_VERSION_1_1;

I checked the code in Mediawiki 1.35.5 the current LTS version.
If one enables httpEngine curl there, one will likely run into the same problem.
It still has $this->curlOptions[CURLOPT_HTTP_VERSION] = CURL_HTTP_VERSION_1_0;

If one uses default httpEngine, and uses php7.3 the problem should also occur because php7.3 defaults to http/1.0 for function file_get_contents()
Only php8 goes to http/1.1

I think the issue must really be resolved in the WMF envoy proxy. It should not give 426 replies.

See also https://github.com/envoyproxy/envoy/issues/5038

It looks like http/1.0 is supported, but it needs explicit configuration:

envoy/source/common/http/conn_manager_impl.cc
Line 568 in 02b2e79
if (!connection_manager_.config_.http1Settings().accept_http_10_) { ….

Aklapper renamed this task from Probem with WMF Envoy proxy. was: /upload.wikimedia.org replies There was a problem during the HTTP request: 426 Upgrade Required to Problem loading thumbnail images due to Envoy (426 Upgrade Required).Jan 29 2022, 7:48 PM
Aklapper changed the task status from In Progress to Open.

This appears to be affecting Patch demo instances too: https://github.com/MatmaRex/patchdemo/issues/422

I am also seeing the issue locally running the master branch.

Here is another related Envoy ticket about 1.0 support that might be useful: https://github.com/envoyproxy/envoy/issues/170

19:07 < taavi> fallout from their TLS termination experiments, envoy does not support http 1.0

19:07 < taavi> fallout from their TLS termination experiments, envoy does not support http 1.0

"TLS termination experiments" referring to T271421: Test envoyproxy as a WMF's CDN TLS terminator with real traffic

XTools was affected by this, since it uses file_get_contents to fetch an on-wiki config. In case it helps others, I managed to get around it by changing the code to use cURL:

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://example.org');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
curl_close($ch);

My RandomInCategory tool on toolforge was affected by this as well since it's using the standard PHP 7.3 installation on toolforge. I fixed it by changing $jsonFile = file_get_contents($queryURL) to :

	$opts = array(
		'http'=>array(
			'method'=>"GET",
			'protocol_version'=>'1.1'
		)
	);

	$context = stream_context_create($opts);

	$jsonFile = file_get_contents( $queryURL, false, $context );

I imagine lots of other tools running on toolforge would be similarly affected.

If other users find their way here, it affects you if you are a HTTP/1.0 client for one reason or another. If you can find a way to use HTTP/1.1 instead you should be fine. See the summary at T271421#7672538

Dzahn renamed this task from Problem loading thumbnail images due to Envoy (426 Upgrade Required) to Problem loading thumbnail images due to Envoy (HTTP/1.0 clients getting '426 Upgrade Required').Feb 2 2022, 11:11 PM
Vgutierrez changed the task status from Open to In Progress.Feb 3 2022, 9:16 AM
Vgutierrez triaged this task as Medium priority.
Vgutierrez moved this task from Backlog to Traffic team actively servicing on the Traffic board.

Change 759448 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] envoyproxy:tls_terminator: Accept HTTP/1.0 for SNI traffic

https://gerrit.wikimedia.org/r/759448

Change 759448 merged by Vgutierrez:

[operations/puppet@production] envoyproxy:tls_terminator: Accept HTTP/1.0 for SNI traffic

https://gerrit.wikimedia.org/r/759448

Vgutierrez claimed this task.
Vgutierrez subscribed.

Thanks for reporting this issue, this unveiled a bug in our envoyproxy configuration where HTTP/1.0 traffic was only being accepted on non-SNI connections. This has been fixed by https://gerrit.wikimedia.org/r/c/operations/puppet/+/759448/

vgutierrez@cp3063:~$ curl -s -D - -o /dev/null --http1.0 'https://upload.wikimedia.org/wikipedia/commons/thumb/8/86/Bundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg/180px-Bundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg'
HTTP/1.1 426 Upgrade Required
date: Thu, 03 Feb 2022 09:44:38 GMT
server: envoy
connection: close
content-length: 0

vgutierrez@cp3063:~$ sudo -i run-puppet-agent -q
vgutierrez@cp3063:~$ curl -s -D - -o /dev/null --http1.0 'https://upload.wikimedia.org/wikipedia/commons/thumb/8/86/Bundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg/180px-Bundesarchiv_B_145_Bild-F078072-0004%2C_Konrad_Adenauer.jpg'
HTTP/1.0 200 OK
date: Thu, 03 Feb 2022 06:23:44 GMT
content-type: image/jpeg
content-length: 12203
last-modified: Sat, 12 Aug 2017 06:05:00 GMT
etag: 2ca16108f326f7916e09bb928e5096ce
server: ATS/8.0.8
age: 12144
x-cache: cp3063 hit, cp3063 miss
x-cache-status: hit-local
server-timing: cache;desc="hit-local", host;desc="cp3063"
strict-transport-security: max-age=106384710; includeSubDomains; preload
report-to: { "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
nel: { "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}
permissions-policy: interest-cohort=()
x-client-ip: 2620:0:862:ed1a::2:b
access-control-allow-origin: *
access-control-expose-headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache
timing-allow-origin: *
accept-ranges: bytes
connection: close

I still can't see thumbnails locally and on Patch demo.

so... right now HTTP/1.0 requests from PHP 7.3 are technically working but there is some obvious issue as those requests are really slow. This is the code I'm using it to test it from a docker container using the php:7.3-cli image:

<?php

$favicon_md5 = '0860c1e63e95143ca82e59a1cf0cd23b';
$favicon = file_get_contents('https://upload.wikimedia.org/favicon.ico');
if (md5($favicon) === $favicon_md5) {
    echo "favicon fetched successfully\n";
}

I'm using a ssh tunnel ssh -L 443:localhost:443 cp3063.esams.wmnet to target cp3063 (upload_envoy node) or cp3065 (upload_haproxy node). Here are the results:

cp3065
vgutierrez@carrot:$  time docker run --network host --add-host upload.wikimedia.org:127.0.0.1 -ti php-test
favicon fetched successfully

real    0m0,433s
user    0m0,022s
sys     0m0,011s
cp3063
vgutierrez@carrot:~/Dockerfiles/php$ time docker run --network host --add-host upload.wikimedia.org:127.0.0.1 -ti php-test
favicon fetched successfully

real    0m20,457s
user    0m0,024s
sys     0m0,024s

so there is an obvious problem when using envoy and HTTP/1.0 requests. This has also been reported by @Legoktm on #wikimedia-traffic:

<legoktm> A friend told me that today making requests using Python 2's `urllib` to the MW API became very slow, but using curl or urllib2 are fine
<legoktm> Not sure if it's related to some of the frontent cache changes or not
<legoktm>  But just wanted to note that since it was weird/hard to figure out
<legoktm> (by very slow I mean 20s for a request that responds near instantly otherwise)

this no longers seems to be related to HTTP/1.0 as the following code also triggers the issue:

<?php
$favicon_md5 = '0860c1e63e95143ca82e59a1cf0cd23b';
$opts = array('http' =>
    array(
        'protocol_version' => 1.1,
    )
);
$context = stream_context_create($opts);
$favicon = file_get_contents('https://upload.wikimedia.org/favicon.ico', false, $context);
if (md5($favicon) === $favicon_md5) {
    echo "favicon fetched successfully\n";
}

It's returning after 20 secs cause it happens to be the value for the delayed_close_timeout on envoy's side.

Upstream made some changes, but it seems there are some post-merge concerns that came up in https://github.com/envoyproxy/envoy/pull/19863

We've migrated the cp servers using envoy to HAProxy so this shouldn't be an issue anymore.