Page MenuHomePhabricator

Download cut off (envoy response timeout at 65s) for Commons file over CDN size threshold (1GB)
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

$ wget "https://upload.wikimedia.org/wikipedia/commons/3/3d/How_to_de-package_and_expose_a_GPU_flip_chip_die.webm"

What happens?:

--2023-11-23 11:28:21--  https://upload.wikimedia.org/wikipedia/commons/3/3d/How_to_de-package_and_expose_a_GPU_flip_chip_die.webm
Resolving upload.wikimedia.org (upload.wikimedia.org)... 185.15.59.240, 2a02:ec80:300:ed1a::2:b
Connecting to upload.wikimedia.org (upload.wikimedia.org)|185.15.59.240|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1144169738 (1.1G) [video/webm]
Saving to: ‘How_to_de-package_and_expose_a_GPU_flip_chip_die.webm’
How_to_de-package_and_expose_a  68%[===================================>                 ] 750.00M  11.2MB/s    in 68s     

2023-11-23 11:29:30 (11.0 MB/s) - Connection closed at byte 786432000. Retrying.

--2023-11-23 11:29:31--  (try: 2)  https://upload.wikimedia.org/wikipedia/commons/3/3d/How_to_de-package_and_expose_a_GPU_flip_chip_die.webm
Connecting to upload.wikimedia.org (upload.wikimedia.org)|185.15.59.240|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 1144169738 (1.1G), 357737738 (341M) remaining [video/webm]
Saving to: ‘How_to_de-package_and_expose_a_GPU_flip_chip_die.webm’

How_to_de-package_and_expose_a 100%[++++++++++++++++++++++++++++++++++++================>]   1.07G  11.2MB/s    in 31s     

2023-11-23 11:30:02 (10.9 MB/s) - ‘How_to_de-package_and_expose_a_GPU_flip_chip_die.webm’ saved [1144169738/1144169738]

What should have happened instead?:
Don't close the connection.

Other information (browser name/version, screenshots, etc.):
With wget at least I got the file. Downloading with Firefox I'm just left with an incomplete file. One is 785383424 bytes (exactly 749MiB), another is 774897664 bytes (exactly 739MiB). Wget was cut off at 786432000, exactly 750MiB.

Event Timeline

Please add relevant project tags - I'm afraid that it will be hard for the Commons community to fix this entirely themselves :)

T210890 comes to my mind.

Confirming from Central Europe:

[acko@fedora ~]$ wget "https://upload.wikimedia.org/wikipedia/commons/3/3d/How_to_de-package_and_expose_a_GPU_flip_chip_die.webm"
--2023-11-23 12:31:44--  https://upload.wikimedia.org/wikipedia/commons/3/3d/How_to_de-package_and_expose_a_GPU_flip_chip_die.webm
Resolving upload.wikimedia.org (upload.wikimedia.org)... 185.15.58.240, 2a02:ec80:600:ed1a::2:b
Connecting to upload.wikimedia.org (upload.wikimedia.org)|185.15.58.240|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1144169738 (1.1G) [video/webm]
Saving to: ‘How_to_de-package_and_expose_a_GPU_flip_chip_die.webm’

How_to_de-package_and_expose_a_GPU_flip_chip_die.web  37%[===========================================>                                                                          ] 411.00M  5.95MB/s    in 72s     

2023-11-23 12:32:56 (5.73 MB/s) - Connection closed at byte 430964736. Retrying.

--2023-11-23 12:32:57--  (try: 2)  https://upload.wikimedia.org/wikipedia/commons/3/3d/How_to_de-package_and_expose_a_GPU_flip_chip_die.webm
Connecting to upload.wikimedia.org (upload.wikimedia.org)|185.15.58.240|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 1144169738 (1.1G), 713205002 (680M) remaining [video/webm]
Saving to: ‘How_to_de-package_and_expose_a_GPU_flip_chip_die.webm’

How_to_de-package_and_expose_a_GPU_flip_chip_die.web  75%[++++++++++++++++++++++++++++++++++++++++++++============================================>                             ] 828.00M  5.15MB/s    in 73s     

2023-11-23 12:34:10 (5.72 MB/s) - Connection closed at byte 868220928. Retrying.

--2023-11-23 12:34:12--  (try: 3)  https://upload.wikimedia.org/wikipedia/commons/3/3d/How_to_de-package_and_expose_a_GPU_flip_chip_die.webm
Connecting to upload.wikimedia.org (upload.wikimedia.org)|185.15.58.240|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 1144169738 (1.1G), 275948810 (263M) remaining [video/webm]
Saving to: ‘How_to_de-package_and_expose_a_GPU_flip_chip_die.webm’

How_to_de-package_and_expose_a_GPU_flip_chip_die.web 100%[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++============================>]   1.07G  5.96MB/s    in 46s     

2023-11-23 12:34:58 (5.75 MB/s) - ‘How_to_de-package_and_expose_a_GPU_flip_chip_die.webm’ saved [1144169738/1144169738]

Please add relevant project tags - I'm afraid that it will be hard for the Commons community to fix this entirely themselves :)

I wasn't sure what was causing this, in such cases I rely on others (like you!) to fix the tags. Even if I had some idea of what caused it, I wouldn't have been sure that Traffic would be the right tag for this.

T210890 comes to my mind.

Confirming from Central Europe:

Right, I should have mentioned I connect to esams. (though you could tell from the IP I connected to)

Interesting: it seems to be time based. My internet is faster and I get ~750MB before it cuts me off. You get cut off after 411MB. In both cases the retry happens ~70s after the download started.

Tried another file, same result:

$ wget "https://upload.wikimedia.org/wikipedia/commons/6/64/Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm"
--2023-11-23 12:42:01--  https://upload.wikimedia.org/wikipedia/commons/6/64/Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm
Resolving upload.wikimedia.org (upload.wikimedia.org)... 185.15.59.240, 2a02:ec80:300:ed1a::2:b
Connecting to upload.wikimedia.org (upload.wikimedia.org)|185.15.59.240|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4213093616 (3.9G) [video/webm]
Saving to: ‘Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm’

Gameplay_0_A.D._Alpha_26_Gefec  18%[========>                                            ] 755.00M  11.3MB/s    in 69s     

2023-11-23 12:43:10 (11.0 MB/s) - Connection closed at byte 791674880. Retrying.

--2023-11-23 12:43:11--  (try: 2)  https://upload.wikimedia.org/wikipedia/commons/6/64/Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm
Connecting to upload.wikimedia.org (upload.wikimedia.org)|185.15.59.240|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 4213093616 (3.9G), 3421418736 (3.2G) remaining [video/webm]
Saving to: ‘Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm’

    Gameplay_0_A.D._Alpha_26_G  19%[+++++++++>                                           ] 776.19M  9.40MB/s               ^C
AlexisJazz renamed this task from After ~750MB file download is cut off to After ~1 minute file download is cut off.Nov 23 2023, 12:00 PM

I can add another datapoint to it being time-based, on my connection the download finished within less than a minute without any interruption.

$ time wget "https://upload.wikimedia.org/wikipedia/commons/3/3d/How_to_de-package_and_expose_a_GPU_flip_chip_die.webm"
--2023-11-23 13:17:20--  https://upload.wikimedia.org/wikipedia/commons/3/3d/How_to_de-package_and_expose_a_GPU_flip_chip_die.webm
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving upload.wikimedia.org (upload.wikimedia.org)... 2a02:ec80:300:ed1a::2:b, 185.15.59.240
Connecting to upload.wikimedia.org (upload.wikimedia.org)|2a02:ec80:300:ed1a::2:b|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1144169738 (1,1G) [video/webm]
Saving to: ‘How_to_de-package_and_expose_a_GPU_flip_chip_die.webm’

How_to_de-package_and_expose_a_GPU_flip_chip_die.webm                                           100%[=====================================================================================================================================================================================================================================================>]   1,07G  28,5MB/s    in 39s     

2023-11-23 13:18:00 (27,7 MB/s) - ‘How_to_de-package_and_expose_a_GPU_flip_chip_die.webm’ saved [1144169738/1144169738]


real    0m39,767s
user    0m2,074s
sys     0m6,073s

(Also esams. Same result if I use wget -4, i.e. IPv4.)

I can add another datapoint to it being time-based, on my connection the download finished within less than a minute without any interruption.

But you probably still couldn't download https://upload.wikimedia.org/wikipedia/commons/6/64/Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm without interruption. In theory that should get interrupted after ~1900MB for you.

Indeed. Lower bound for the cutoff time is 66 s:

$ time wget https://upload.wikimedia.org/wikipedia/commons/6/64/Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm
--2023-11-23 14:55:31--  https://upload.wikimedia.org/wikipedia/commons/6/64/Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving upload.wikimedia.org (upload.wikimedia.org)... 2a02:ec80:300:ed1a::2:b, 185.15.59.240
Connecting to upload.wikimedia.org (upload.wikimedia.org)|2a02:ec80:300:ed1a::2:b|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4213093616 (3,9G) [video/webm]
Saving to: ‘Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm’

Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm                           45%[===============================================================================================================>                                                                                                                                      ]   1,80G  28,4MB/s    in 66s     

2023-11-23 14:56:38 (27,8 MB/s) - Connection closed at byte 1929379840. Retrying.

--2023-11-23 14:56:39--  (try: 2)  https://upload.wikimedia.org/wikipedia/commons/6/64/Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm
Connecting to upload.wikimedia.org (upload.wikimedia.org)|2a02:ec80:300:ed1a::2:b|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 4213093616 (3,9G), 2283713776 (2,1G) remaining [video/webm]
Saving to: ‘Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm’

Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm                           91%[++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++================================================================================================================>                     ]   3,60G  28,3MB/s    in 66s     

2023-11-23 14:57:45 (27,9 MB/s) - Connection closed at byte 3860856832. Retrying.

--2023-11-23 14:57:47--  (try: 3)  https://upload.wikimedia.org/wikipedia/commons/6/64/Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm
Connecting to upload.wikimedia.org (upload.wikimedia.org)|2a02:ec80:300:ed1a::2:b|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 4213093616 (3,9G), 352236784 (336M) remaining [video/webm]
Saving to: ‘Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm’

Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm                          100%[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++====================>]   3,92G  28,4MB/s    in 12s     

2023-11-23 14:58:00 (27,4 MB/s) - ‘Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm’ saved [4213093616/4213093616]


real    2m28,595s
user    0m6,865s
sys     0m21,759s

Same result from cURL btw:

$ time curl https://upload.wikimedia.org/wikipedia/commons/6/64/Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 46 4017M   46 1860M    0     0  27.9M      0  0:02:23  0:01:06  0:01:17 28.4M
curl: (92) HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)

real    1m6,523s
user    0m4,566s
sys     0m6,106s

There are a handful of 65 s timeouts in Puppet, though I have no idea which (if any) of them are relevant.

dumps.wikimedia.org is not affected btw:

$ timeout -v 5m curl https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.gz > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  1  120G    1 1271M    0     0  4347k      0  8:05:09  0:04:59  8:00:10 4402k
timeout: sending signal TERM to command ‘curl’

With limit-rate it doesn't interrupt:

$ time wget --limit-rate 10K "https://upload.wikimedia.org/wikipedia/commons/6/64/Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm"
--2023-11-23 16:47:09--  https://upload.wikimedia.org/wikipedia/commons/6/64/Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm
Resolving upload.wikimedia.org (upload.wikimedia.org)... 185.15.59.240, 2a02:ec80:300:ed1a::2:b
Connecting to upload.wikimedia.org (upload.wikimedia.org)|185.15.59.240|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4213093616 (3.9G) [video/webm]
Saving to: ‘Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01_von_10.webm.4’

bm.4                             0%[                                                     ]   1.44M  10.0KB/s    eta 4d 18h ^C

real	2m28.465s
user	0m0.029s
sys	0m0.028s

--no-http-keep-alive makes no difference though.

The file is over the CDN size threshold (1G) so it will hit swift every time that it needs to be fetched. Could it be related by the work done by @MatthewVernon on T317616

envoy sets an upstream response timeout by default at 65s (https://github.com/wikimedia/operations-puppet/blob/397c454bbad404c9667c6f63f86e993b1970af8a/modules/envoyproxy/manifests/tls_terminator.pp#L147) it needs to be adjusted properly to allow the transfer of big files

I'm trying to see what the previous nginx-based timeout was, but it's code I'm unfamiliar with

I think, per modules/tlsproxy/manifests/localssl.pp it was 180s when we were using nginx, so I'll adjust it accordingly.

nginx didn't enforce a timeout for the whole request but just a timeout (180s) between reads from the server so that won't be enough. To mimick the behavior you need to set the response timeout to 0 and stream_idle_timeout to 180s (dunno if the latter is supported by our puppetization)

Aklapper renamed this task from After ~1 minute file download is cut off to Download cut off (envoy response timeout at 65s) for Commons file over CDN size threshold (1GB).Nov 24 2023, 11:35 AM

nginx didn't enforce a timeout for the whole request but just a timeout (180s) between reads from the server so that won't be enough. To mimick the behavior you need to set the response timeout to 0 and stream_idle_timeout to 180s (dunno if the latter is supported by our puppetization)

Thanks for the pointer! Yes, it looks like envoyproxy::tls_terminator has a $stream_idle_timeout

...but profile::tlsproxy::envoy doesn't have that configuation available as far as I can see...

Let's add it as an optional parameter, and try and pass it through.

Change 977178 had a related patch set uploaded (by MVernon; author: MVernon):

[operations/puppet@production] tlsproxy::envoy allow setting stream_idle_timeout, 180s for swift

https://gerrit.wikimedia.org/r/977178

Change 977178 merged by MVernon:

[operations/puppet@production] tlsproxy::envoy allow setting stream_idle_timeout, 180s for swift

https://gerrit.wikimedia.org/r/977178

I think, per modules/tlsproxy/manifests/localssl.pp it was 180s when we were using nginx, so I'll adjust it accordingly.

180s wouldn't be enough, that would result in my downloads getting cut off after ~2GB. And that's with relatively fast internet, someone with 4Mbps ADSL or crap wifi would see their download getting cut off after ~90MB.

@AlexisJazz it's a time for how long the connect has no data going over it "Each time an encode/decode event for headers or data is processed for the stream, the timer will be reset."

MatthewVernon claimed this task.

I can confirm that I can now download this file, even though it takes nearly 9 minutes on my rubbish internet:

matthew@aragorn:/tmp$ wget "https://upload.wikimedia.org/wikipedia/commons/3/3d/How_to_de-package_and_expose_a_GPU_flip_chip_die.webm"
--2023-11-24 13:16:32--  https://upload.wikimedia.org/wikipedia/commons/3/3d/How_to_de-package_and_expose_a_GPU_flip_chip_die.webm
Resolving upload.wikimedia.org (upload.wikimedia.org)... 2a02:ec80:300:ed1a::2:b, 185.15.59.240
Connecting to upload.wikimedia.org (upload.wikimedia.org)|2a02:ec80:300:ed1a::2:b|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1144169738 (1.1G) [video/webm]
Saving to: ‘How_to_de-package_and_expose_a_GPU_flip_chip_die.webm’

How_to_de-package_a 100%[===================>]   1.07G  1.94MB/s    in 8m 44s  

2023-11-24 13:25:17 (2.08 MB/s) - ‘How_to_de-package_and_expose_a_GPU_flip_chip_die.webm’ saved [1144169738/1144169738]

...which confirms the fix is sound (and, e.g. demonstrates that you don't get cut off after 180s).