Page MenuHomePhabricator

multiple files do not upload (chunk upload) for a year
Open, Needs TriagePublicBUG REPORT

Description

two of 47 files that do not upload:

response codes 500, 503 and 504 followed by file-not-found in stash or stasherror of stathsfail in upload stage

01,13:36:02fyi: Uploading chunk [17 of 19] of '75622881.webm.2.webm' filekey 190usz3az8ss.4sbqt1.6080484.webm Free_the_Nipple_UK,_Fringe_2017_video.webm 2
01,13:37:16inf: Status CONTINUE 89,4737% uploaded; last chunk 97.210.767 bytes uploaded, served by "mw1356"}
01,13:37:16fyi: Uploading chunk [18 of 19] of '75622881.webm.2.webm' filekey 190usz3az8ss.4sbqt1.6080484.webm Free_the_Nipple_UK,_Fringe_2017_video.webm 2
01,13:38:30inf: Status CONTINUE 94,7368% uploaded; last chunk 97.210.767 bytes uploaded, served by "mw1356"}
01,13:38:30fyi: Uploading chunk [19 of 19] of '75622881.webm.2.webm' filekey 190usz3az8ss.4sbqt1.6080484.webm Free_the_Nipple_UK,_Fringe_2017_video.webm 2
01,13:40:38inf: Status CONTINUE 94,7368% uploaded; last chunk 0 bytes uploaded, served by "mw1356"}
01,13:40:38err: Bad response from server: 503 0 null
01,13:41:52inf: Status STASHFAILED 94,7368% uploaded; last chunk 0 bytes uploaded, served by "mw1422"}
01,13:41:52err: upload failed - stashfailed - Could not read file "mwstore://local-swift-eqiad/local-temp/4/42/190usz3az8ss.4sbqt1.6080484.webm.0". - served by "mw1422"}

upload Free_the_Nipple_UK,_Fringe_2017_video.webm 75622881.webm.2.webm has failed

01,13:42:37inf: Uploading 76320869.webm.2.webm
01,13:42:38fyi: Uploading chunk [1 of 43] of '76320869.webm.2.webm' filekey null Politparade.webm 3
01,13:44:34inf: Uploaded Chunk - filekey 190uvfvoqv8o.fdv2w1.6080484.webm 190uvfvoqv8o.fdv2w1.6080484.webm 0 Response{protocol=h2, code=200, message=, url=https://commons.wikimedia.org/w/api.php?action=upload} {"upload":{"warnings":{"exists":"Politparade.webm"},"result":"Continue","offset":99882934,"filekey":"190uvfvoqv8o.fdv2w1.6080484.webm"}}
01,13:44:36fyi: Uploading chunk [2 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,13:46:08inf: Status CONTINUE 4,6512% uploaded; last chunk 199.765.895 bytes uploaded, served by "mw1422"}
01,13:46:08fyi: Uploading chunk [3 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
...
01,14:13:44fyi: Uploading chunk [18 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,14:15:01inf: Status CONTINUE 41,8605% uploaded; last chunk 99.882.961 bytes uploaded, served by "mw1422"}
01,14:15:02fyi: Uploading chunk [19 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,14:16:22inf: Status CONTINUE 44,1860% uploaded; last chunk 99.882.961 bytes uploaded, served by "mw1422"}
01,14:16:23fyi: Uploading chunk [20 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,14:18:42inf: Status CONTINUE 46,5116% uploaded; last chunk 99.882.961 bytes uploaded, served by "mw1422"}
März 01, 2022 2:18:42 NACHM. okhttp3.internal.platform.Platform log
WARNUNG: A connection to https://commons.wikimedia.org/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE);
01,14:18:48fyi: Uploading chunk [21 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,14:20:06inf: Status CONTINUE 48,8372% uploaded; last chunk 99.882.961 bytes uploaded, served by "mw1422"}
01,14:20:08fyi: Uploading chunk [22 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,14:21:27inf: Status CONTINUE 51,1628% uploaded; last chunk 99.882.961 bytes uploaded, served by "mw1422"}
01,14:21:28fyi: Uploading chunk [23 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,14:23:30inf: Status CONTINUE 53,4884% uploaded; last chunk 99.882.961 bytes uploaded, served by "mw1422"}
01,14:23:31fyi: Uploading chunk [24 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,14:25:35inf: Status CONTINUE 55,8139% uploaded; last chunk 99.882.961 bytes uploaded, served by "mw1422"}
März 01, 2022 2:25:35 NACHM. okhttp3.internal.platform.Platform log
WARNUNG: A connection to https://commons.wikimedia.org/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE);
01,14:25:37fyi: Uploading chunk [25 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,14:26:55inf: Status CONTINUE 58,1395% uploaded; last chunk 99.882.961 bytes uploaded, served by "mw1422"}
01,14:26:57fyi: Uploading chunk [26 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,14:29:27inf: Status CONTINUE 60,4651% uploaded; last chunk 99.882.961 bytes uploaded, served by "mw1422"}
...
01,14:42:49fyi: Uploading chunk [33 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,14:44:34inf: Status CONTINUE 76,7442% uploaded; last chunk 99.882.961 bytes uploaded, served by "mw1422"}
01,14:44:35fyi: Uploading chunk [34 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,14:46:21inf: Status CONTINUE 79,0698% uploaded; last chunk 99.882.961 bytes uploaded, served by "mw1422"}
01,14:46:22fyi: Uploading chunk [35 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,14:48:52inf: Status CONTINUE 81,3953% uploaded; last chunk 99.882.961 bytes uploaded, served by "mw1422"}
01,14:48:53fyi: Uploading chunk [36 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,14:51:11inf: Status CONTINUE 83,7209% uploaded; last chunk 99.882.961 bytes uploaded, served by "mw1422"}
März 01, 2022 2:51:11 NACHM. okhttp3.internal.platform.Platform log
WARNUNG: A connection to https://commons.wikimedia.org/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE);
01,14:51:13fyi: Uploading chunk [37 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,14:52:32inf: Status CONTINUE 86,0465% uploaded; last chunk 99.882.961 bytes uploaded, served by "mw1422"}
01,14:52:34fyi: Uploading chunk [38 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,14:53:54inf: Status CONTINUE 88,3721% uploaded; last chunk 99.882.961 bytes uploaded, served by "mw1422"}
01,14:53:55fyi: Uploading chunk [39 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,14:55:57inf: Status CONTINUE 90,6977% uploaded; last chunk 99.882.961 bytes uploaded, served by "mw1422"}
01,14:55:59fyi: Uploading chunk [40 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,14:57:17inf: Status CONTINUE 93,0233% uploaded; last chunk 99.882.961 bytes uploaded, served by "mw1422"}
März 01, 2022 2:57:17 NACHM. okhttp3.internal.platform.Platform log
WARNUNG: A connection to https://commons.wikimedia.org/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE);
01,14:57:19fyi: Uploading chunk [41 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,14:58:35inf: Status CONTINUE 95,3488% uploaded; last chunk 99.882.961 bytes uploaded, served by "mw1422"}
01,14:58:36fyi: Uploading chunk [42 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,14:59:53inf: Status CONTINUE 97,6744% uploaded; last chunk 99.882.961 bytes uploaded, served by "mw1422"}
01,14:59:54fyi: Uploading chunk [43 of 43] of '76320869.webm.2.webm' filekey 190uvfvoqv8o.fdv2w1.6080484.webm Politparade.webm 3
01,15:02:02inf: Status CONTINUE 97,6744% uploaded; last chunk 0 bytes uploaded, served by "mw1422"}
01,15:02:02err: Bad response from server: 503 0 null
01,15:03:20inf: Status STASHFAILED 97,6744% uploaded; last chunk 0 bytes uploaded, served by "mw1400"}
01,15:03:20err: upload failed - stashfailed - Could not read file "mwstore://local-swift-eqiad/local-temp/0/0a/190uvfvoqv8o.fdv2w1.6080484.webm.0". - served by "mw1400"}

upload Politparade.webm 76320869.webm.2.webm has failed

steps to recreate

  1. get a copy of one of the files from @Urbanecm
  2. upload
  3. fail

Event Timeline

Umherirrender added a subscriber: Umherirrender.

The error text is from message backend-fail-read which gets raised from FileBackendStore::doConcatenate. It seems the new chunk was uploaded to the server, but the server does not find the old chunks to concate both together.

The log messages from the task description shows that the responding server for the api requests changed (from mw1356 to mw1422 and for the other request from mw1422 to mw1400).
Is there some primary/replica handling in the swift backend that the temp file is not found when another api server handles the response? Some local caches etc. Not sure if that is a possible way to find a cause here.

Not sure if retry works from the failing point (offset). Maybe that works if the file shows up on Special:UploadStash onwiki.

Umherirrender renamed this task from multiple files do not upload (revison upload) for a year to multiple files do not upload (chunk upload) for a year.Mar 1 2022, 5:01 PM

Not sure if retry works from the failing point (offset)

It does not. I have tried that.

Also: The server keeps track of uploaded chunks and gives an error message if the same chunk is uploaded twice.

Krinkle added a subscriber: Krinkle.

I searched on the mediawiki dashboard in Logstash for message:*190usz3az8ss* OR message:"Could not read file". I found not hits for the latter, and found only a very indirect message about the former, which may be of use. It's a diagnostic message in the redis channel from RedisBagOStuff, which we use for the MainStash service.

  • Mar 1, 2022 @ 13:12: warning - redis - mw1390 - Status value for 'commonswiki:uploadstatus:6080484:190usz3az8ss.4sbqt1.6080484.webm' at 'status'; serialization is suspect.
  • Mar 1, 2022 @ 13:13: warning - redis - mw1423 - Status value for 'commonswiki:uploadstatus:6080484:190usz3az8ss.4sbqt1.6080484.webm' at 'status'; serialization is suspect.
  • Mar 1, 2022 @ 13:15: warning - redis - mw1312 - Status value for 'commonswiki:uploadstatus:6080484:190usz3az8ss.4sbqt1.6080484.webm' at 'status'; serialization is suspect.
  • 16 more from later minutes on different servers. All are post requests to /w/api.php?action=upload.

The first one has reqId 6dddde0b-3c2c-4190-aa18-53f5fdb1aac8, and that logged only one other diagnostic message:

WARNING redis Status value for 'commonswiki:uploadstatus:6080484:190usz3az8ss.4sbqt1.6080484.webm' at 'status'; serialization is suspect.

DEBUG DBPerformance Transaction spent 3ms in writes, under the 3,000ms limit

I do assume, that not only my files are affected, but also attempted uploads by other users. Are there "serialization is suspect" messages for other users for "action=upload" in logs?

It seems upload always fails with a "5" http-response. The maximum on error info that I can provide is this log of response headers and response body. (edited IP and CSS)

16,09:49:47fyi: Uploading chunk [9 of 11] of 'Piss-In%2C_Pee_for_Free_2022_Berlin_03.webm.uiirjbbuiir.116091891.uploadrevision' filekey 1924ab86tri0.usb40x.6080484.webm Piss-In,_Pee_for_Free_2022_Berlin_03.webm 1
mw1402.eqiad.wmnet
{"upload":{"result":"Continue","stage":"uploading","offset":869930997}}
16,09:51:46inf: Status CONTINUE 81,8182% uploaded; last chunk 96.659.000 bytes uploaded, served by
16,09:51:46fyi: Uploading chunk [10 of 11] of 'Piss-In%2C_Pee_for_Free_2022_Berlin_03.webm.uiirjbbuiir.116091891.uploadrevision' filekey 1924ab86tri0.usb40x.6080484.webm Piss-In,_Pee_for_Free_2022_Berlin_03.webm 1
mw1408.eqiad.wmnet
{"upload":{"result":"Continue","stage":"uploading","offset":966589997}}
16,09:53:43inf: Status CONTINUE 90,9091% uploaded; last chunk 96.659.000 bytes uploaded, served by
16,09:53:43fyi: Uploading chunk [11 of 11] of 'Piss-In%2C_Pee_for_Free_2022_Berlin_03.webm.uiirjbbuiir.116091891.uploadrevision' filekey 1924ab86tri0.usb40x.6080484.webm Piss-In,_Pee_for_Free_2022_Berlin_03.webm 1
5er Response: 503
date Wed, 16 Mar 2022 09:55:55 GMT
server Varnish
content-type text/html; charset=utf-8
age 0
x-cache cp3050 int
x-cache-status int-front
server-timing cache;desc="int-front", host;desc="cp3050"
strict-transport-security max-age=106384710; includeSubDomains; preload
report-to { "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
nel { "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}
permissions-policy interest-cohort=()
x-client-ip XX.XXX.25.146
content-length 1793

<!DOCTYPE html>
<html lang="en">
<meta charset="utf-8">
<title>Wikimedia Error</title>
<style>
\* { margin: 0; padding: 0; }
body { background: #fff; font: 15px/1.6 sans-serif; color: #333; }
.content { margin: 7% auto 0; padding: 2em 1em 1em; max-width: 640px; }
.footer { clear: both; margin-top: 14%; border-top: 1px solid #e5e5e5; background: #f9f9f9; padding: 2em 0; font-size: 0.8em; text-align: center; }
img { float: left; margin: 0 2em 2em 0; }
a img { border: 0; }
h1 { margin-top: 1em; font-size: 1.2em; }
.content-text { overflow: hidden; overflow-wrap: break-word; word-wrap: break-word; -webkit-hyphens: auto; -moz-hyphens: auto; -ms-hyphens: auto; hyphens: auto; }
p { margin: 0.7em 0 1em 0; }
a { color: #0645ad; text-decoration: none; }
a:hover { text-decoration: underline; }
code { font-family: sans-serif; }
.text-muted { color: #777; }
</style>
<div class="content" role="main">
<a href="https://www.wikimedia.org"><img src="https://www.wikimedia.org/static/images/wmf-logo.png" srcset="https://www.wikimedia.org/static/images/wmf-logo-2x.png 2x" alt="Wikimedia" width="135" height="101">
</a>
<h1>Error</h1>
<div class="content-text">
<p>Our servers are currently under maintenance or experiencing a technical problem.

Please <a href="" title="Reload this page" onclick="window.location.reload(false); return false">try again</a> in a few&nbsp;minutes.</p>

<p>See the error message at the bottom of this page for more&nbsp;information.</p>
</div>
</div>
<div class="footer"><p>If you report this error to the Wikimedia System Administrators, please include the details below.</p><p class="text-muted"><code>Request from XX.XXX.25.146 via cp3050 cp3050, Varnish XID 499676136<br>Error: 503, Backend fetch failed at Wed, 16 Mar 2022 09:55:55 GMT</code></p>
</div>
</html>

{"upload":{"result":"Continue","stage":"uploading","offset":966589997}}
16,09:55:56inf: Status CONTINUE 90,9091% uploaded; last chunk 0 bytes uploaded, served by

I succeded uploading File:Piss-In, Pee for Free 2022 Berlin 07.webm. The upload failed 25 times, then succeded (If you look at the upload comment it contains the number "25002". That is a slightly broken counter but it tells that there were 25 failed uploads before).

File:Piss-In, Pee for Free 2022 Berlin 04.webm succeded with 31st attempt. But only the first chunk (92MB of 1.26GB) was published.

Upload restarted for File:Piss-In, Pee for Free 2022 Berlin 03.webm and File:Piss-In, Pee for Free 2022 Berlin 04.webm