Page MenuHomePhabricator

Cannot download STL files due to network error
Closed, ResolvedPublic

Description

On Chrome 76 (current stable), connecting from Europe. This makes 3D files impossible to view in Media Viewer:

Capture d'écran 2019-08-28 11.21.20.png (80×3 px, 38 KB)

https://upload.wikimedia.org/wikipedia/commons/d/de/65-msr-sandaled-foot-5.stl
https://upload.wikimedia.org/wikipedia/commons/0/09/Scan_the_World_-_Venus_de_Milo.stl

Opening these links directly also causes Chrome to choke, stating "network error".

Using wget on the command line, the download stops on an error and resumes automatically:

~/Documents/Mediawiki/vagrant (master) $ wget https://upload.wikimedia.org/wikipedia/commons/d/de/65-msr-sandaled-foot-5.stl
--2019-08-28 11:19:14--  https://upload.wikimedia.org/wikipedia/commons/d/de/65-msr-sandaled-foot-5.stl
Résolution de upload.wikimedia.org (upload.wikimedia.org)… 2620:0:862:ed1a::2:b, 91.198.174.208
Connexion à upload.wikimedia.org (upload.wikimedia.org)|2620:0:862:ed1a::2:b|:443… connecté.
requête HTTP transmise, en attente de la réponse… 200 OK
Taille : non indiqué [application/sla]
Sauvegarde en : « 65-msr-sandaled-foot-5.stl »

65-msr-sandaled-foot-5.stl                  [     <=>                                                                        ]   1,25M  1,42MB/s    ds 0,9s    

2019-08-28 11:19:16 (1,42 MB/s) — Erreur de lecture à l’octet 1310720 (Success).Nouvel essai.

--2019-08-28 11:19:17--  (essai :  2)  https://upload.wikimedia.org/wikipedia/commons/d/de/65-msr-sandaled-foot-5.stl
Connexion à upload.wikimedia.org (upload.wikimedia.org)|2620:0:862:ed1a::2:b|:443… connecté.
requête HTTP transmise, en attente de la réponse… 206 Partial Content
Taille : 34999984 (33M), 33689264 (32M) restant [application/sla]
Sauvegarde en : « 65-msr-sandaled-foot-5.stl »

65-msr-sandaled-foot-5.stl              100%[++=============================================================================>]  33,38M  2,15MB/s    ds 16s     

2019-08-28 11:19:33 (2,07 MB/s) — « 65-msr-sandaled-foot-5.stl » sauvegardé [34999984/34999984]

Firefox 69 also chokes on that link, with an even less helpful error message and no entry in the developer tools:

Capture d'écran 2019-08-28 11.30.34.png (198×720 px, 129 KB)

Finally, Safari is actually the only browser of the 3 showing the request:

Capture d'écran 2019-08-28 11.36.07.png (1×2 px, 623 KB)

It's served by ATS/8.0.3, with x-cache header value: cp3034 miss, cp3047 miss

Whatever is happening seems to require a minimum amount of data to be transferred over before kicking in. And by reloading the page in Safari multiple times, it seems like the amount of data it's able to download before failing changes every time. I've seen it go as low as 400ish KB and as high as 1.8MB.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Gilles renamed this task from STL file downloads result in ERR_SPDY_PROTOCOL_ERROR to Cannot download STL files due to network error.Aug 28 2019, 9:40 AM
Gilles updated the task description. (Show Details)

I can reproduce the issue, using HTTP/2 and HTTP/1.1 on eqsin:

willikins:~ vgutierrez$ curl https://upload.wikimedia.org/wikipedia/commons/d/de/65-msr-sandaled-foot-5.stl -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1568k    0 1568k    0     0   867k      0 --:--:--  0:00:01 --:--:--  867k
curl: (92) HTTP/2 stream 1 was not closed cleanly: INTERNAL_ERROR (err 2)
willikins:~ vgutierrez$ curl --http1.1 https://upload.wikimedia.org/wikipedia/commons/d/de/65-msr-sandaled-foot-5.stl -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2656k    0 2656k    0     0  1072k      0 --:--:--  0:00:02 --:--:-- 1072k
curl: (18) transfer closed with outstanding read data remaining

I cannot reproduce against ATS backend instance:

vgutierrez@cp5004:~$ curl -H 'Host: upload.wikimedia.org'  http://127.0.0.1:3128/wikipedia/commons/d/de/65-msr-sandaled-foot-5.stl -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 33.3M  100 33.3M    0     0   360M      0 --:--:-- --:--:-- --:--:--  362M

But it's reproducible against the varnish-fe instance:

vgutierrez@cp5002:~$ curl -H 'Host: upload.wikimedia.org' -H 'X-Forwarded-Proto: https' http://127.0.0.1:3120/wikipedia/commons/d/de/65-msr-sandaled-foot-5.stl -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1472k    0 1472k    0     0   539k      0 --:--:--  0:00:02 --:--:--  539k
curl: (18) transfer closed with outstanding read data remaining
ema triaged this task as Medium priority.
ema moved this task from Backlog to Caching on the Traffic board.

The issue happens due to varnish-frontend giving up the fetch from ATS because of lack of free space:

--  ObjStatus      200
--  ObjReason      OK
--  ObjHeader      Date: Tue, 03 Sep 2019 10:23:22 GMT
--  ObjHeader      Content-Type: application/sla
--  ObjHeader      X-Object-Meta-Sha1Base36: sbj4k8lyhuctnej0eohl2abqxflwsdq
--  ObjHeader      Last-Modified: Thu, 01 Nov 2018 20:53:09 GMT
--  ObjHeader      Etag: 5f4d2b97f51853aff124ac6466efee32-df
--  ObjHeader      X-Timestamp: 1541105588.13371
--  ObjHeader      Access-Control-Allow-Origin: *
--  ObjHeader      Cache-Control: public, max-age=86400
--  ObjHeader      X-Trans-Id: tx3d1469b030ce4ac98f513-005d6e3f1a
--  ObjHeader      Content-Encoding: gzip
--  ObjHeader      Vary: Accept-Encoding
--  ObjHeader      Server: ATS/8.0.3
--  ObjHeader      X-Cache-Int: cp3034 hit
--  Fetch_Body     2 chunked stream
--  ExpKill        LRU_Cand p=0x7fa55b765520 f=0x0 r=1
--  ExpKill        LRU x=715662765
[...]
--  ExpKill        LRU_Exhausted
--  FetchError     Could not get storage
--  BackendClose   445 vcl-c4cbdf67-6123-45dd-b85d-29e508706c40.be_cp3034_esams_wmnet
--  BereqAcct      421 0 421 559 1458176 1458735
--  End

ATS should set Content-Length, and it doesn't. Because of that, our frontend VCL cannot make the informed decision of passing, as it should given that the object in question is 34M.

Change 534156 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] VCL: only cache responses with explicit CL on upload

https://gerrit.wikimedia.org/r/534156

Change 534156 merged by Ema:
[operations/puppet@production] VCL: only cache responses with explicit CL on upload

https://gerrit.wikimedia.org/r/534156

@Gilles: the issue should now be fixed, can you confirm?