Commons: UploadChunkFileException: Error storing file: backend-fail-internal; local-swift-codfw
Open, Needs TriagePublicPRODUCTION ERROR
Actions

Assigned To

Authored By

	Yann
	Feb 5 2023, 5:34 PM

Description

I get

00024: FAILED: internal_api_error_UploadChunkFileException: [9e95f0c8-9cd8-4daf-93bf-996b77705f13] Caught exception of type UploadChunkFileException

while uploading a new version of https://commons.wikimedia.org/wiki/File:The_R%C3%A1m%C3%A1yana_of_Tulsi_D%C3%A1s.djvu
using https://commons.wikimedia.org/wiki/User:Rillke/bigChunkedUpload.js

from /srv/mediawiki/php-1.40.0-wmf.21/includes/upload/UploadFromChunks.php(359)
#0 /srv/mediawiki/php-1.40.0-wmf.21/includes/upload/UploadFromChunks.php(248): UploadFromChunks->outputChunk(string)
#1 /srv/mediawiki/php-1.40.0-wmf.21/includes/api/ApiUpload.php(278): UploadFromChunks->addChunk(string, integer, integer)
#2 /srv/mediawiki/php-1.40.0-wmf.21/includes/api/ApiUpload.php(157): ApiUpload->getChunkResult(array)
#3 /srv/mediawiki/php-1.40.0-wmf.21/includes/api/ApiUpload.php(128): ApiUpload->getContextResult()
#4 /srv/mediawiki/php-1.40.0-wmf.21/includes/api/ApiMain.php(1901): ApiUpload->execute()
#5 /srv/mediawiki/php-1.40.0-wmf.21/includes/api/ApiMain.php(878): ApiMain->executeAction()
#6 /srv/mediawiki/php-1.40.0-wmf.21/includes/api/ApiMain.php(849): ApiMain->executeActionWithErrorHandling()
#7 /srv/mediawiki/php-1.40.0-wmf.21/api.php(90): ApiMain->execute()
#8 /srv/mediawiki/php-1.40.0-wmf.21/api.php(45): wfApiMain()
#9 /srv/mediawiki/w/api.php(3): require(string)
#10 {main}

Details

MediaWiki Version: 1.40.0-wmf.21
Request URL: https://commons.wikimedia.org/wiki/File:The_R%C3%A1m%C3%A1yana_of_Tulsi_D%C3%A1s.djvu

Subject	Repo	Branch	Lines +/-
filebackend: Include truncated http body for 502 on SwiftFileBackend	mediawiki/core	master	+35 -31
Provision the revised Swift dashboard	operations/puppet	production	+2 K -0
Unprovision the "swift" dashboard	operations/puppet	production	+0 -1 K

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open	PRODUCTION ERROR	MatthewVernon	T328872 Commons: UploadChunkFileException: Error storing file: backend-fail-internal; local-swift-codfw
		Open		None	T360913 Swift proxy server misbehaviour (no longer calling `accept`?)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

In T328872#8684815, @Rschen7754 wrote:

It appears to be somewhat random, however it throws an exception that breaks my script and I have to keep restarting it. So yes, it is a significant time loss.

Is that a pywikipedia script? Seems like it might be easier to fix the script, in the short term.

In T328872#8685266, @Yann wrote:

This bug would not be so bad if it would be possible to start again where the chunk failed.

The error handling logic for the script is here and the configuration here. It does retry failed chunks, up to 250 times. I have a hard time believing the average upload would
error out more than 250 times, so maybe there is a bug in the retry logic?

No idea how to test it though. Uploading multi-gigabyte test files to Commons doesn't seem like a great idea.

Don't know if it helps but I just got this error with CropTool for my first attempt at creating https://commons.wikimedia.org/wiki/File:Aurora_Mayor_and_council_members_visit_Buckley_AFB_(4734201)_(cropped).jpeg

I tried again a few seconds later and it worked.

In T328872#8683319, @MatthewVernon wrote:

I am a bit concerned about the small rise in 50x codes reported on the ATS - backend dashboard since about 13:00:00 on 2023-03-09, but that doesn't match your time at all, and I can't find any relevant change that happened then; it is seen in both codfw and eqiad, too. Grepping proxy-access.log suggests that nearly all of the (at least today's) 50x codes from swift related to thumbnails, which makes me wonder if thumbor is unhappy again.

For reference, these are all GET or HEAD, so are irrelevant to this issue.

In T328872#8685266, @Yann wrote:

This bug would not be so bad if it would be possible to start again where the chunk failed.

The error handling logic for the script is here and the configuration here. It does retry failed chunks, up to 250 times. I have a hard time believing the average upload would
error out more than 250 times, so maybe there is a bug in the retry logic?

No idea how to test it though. Uploading multi-gigabyte test files to Commons doesn't seem like a great idea.

Well, the code works fine if there is a connection issue on my side. In that case, it tries again until the connection gets back.
But here the issue is different. The error comes from the server side, and the process hangs with the error message.

Not sure if it's related, but I keep getting this error when uploading new version of files.

It might be a generic issue, not a chunk upload one? (Also see my earlier comment on this when uploading images through Lightroom, also not sure that's chunk upload?)

akosiaris subscribed.Mar 14 2023, 3:42 PM

Logs show that the issue started at 2023-03-07 14:23:35 and it fixed itself shortly after 2023-03-13 17:08:35.

UploadChunkFileException frequency.png (615×1 px, 110 KB)

SAL entry associated with the end of the incident:

17:16 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)

There are more details about the errors in the FileOperation channel, for example reqId 6e7c6a99-03f7-4f7b-9bb2-c3e3da0a76bc

HTTP 502 (Bad Gateway) in 'SwiftFileBackend::doStoreInternal' (given '{"async":false,"src":"/tmp/phpJAdHEm","dst":"mwstore://local-swift-codfw/local-temp/d/dc/19wxlydr879g.mkokap.6179471.webm.4","overwrite":true,"headers":[]}')

98% of the FileOperation log entries with a "code" field were 502. Most 502 errors did not lead to an UploadChunkFileException. The number of FileOperation 502 errors per day:

2023-03-06	1
2023-03-07	2255
2023-03-08	5665
2023-03-09	22678
2023-03-10	38292
2023-03-11	46887
2023-03-12	44481
2023-03-13	32649
2023-03-14	582

On the Swift dashboard you can see these errors grouped under "proxy-server errors" (swift_proxy_server_errors_total), although there is a background rate of errors in that metric which do not correspond to MediaWiki FileOperation log entries.

Breaking down swift_proxy_server_errors_total by instance, you can see that ms-fe2009 was at fault. There is no obvious explanation in its server.log. There's not enough 502 errors in its proxy-access.log.

Change 899885 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[operations/puppet@production] Unprovision the "swift" dashboard

https://gerrit.wikimedia.org/r/899885

gerritbot added a project: Patch-For-Review.Mar 16 2023, 12:31 AM

Fuzheado subscribed.Mar 17 2023, 1:15 PM

I think this issue is resolved now? Swift error rates have remained low since we restarted swift and thumbor.

Xover subscribed.Mar 17 2023, 5:21 PM

@MatthewVernon I think so. Although it guess it might be worth evaluating if it is possible to make incidents like this somehow more visible and detectable ?

Hi, This seems to work OK for me now. Thanks for fixing it.

MatthewVernon closed this task as Resolved.Mar 20 2023, 2:20 PM

MatthewVernon claimed this task.

Change 899885 merged by Tim Starling:

[operations/puppet@production] Unprovision the "swift" dashboard

https://gerrit.wikimedia.org/r/899885

Maintenance_bot removed a project: Patch-For-Review.Mar 21 2023, 12:10 AM

ms-fe1009 and ms-fe2009 account for almost all errors, according to the swift_proxy_server_errors_total metric. They are also the only servers running Debian Stretch and Swift 2.10, all the others have bullseye and Swift 2.26. I don't know whether they are actually broken or if the newer Swift just doesn't report its errors. I see there is T279637 tracking the upgrade.

Change 902064 had a related patch set uploaded (by MVernon; author: MVernon):

[operations/puppet@production] Provision the revised Swift dashboard

https://gerrit.wikimedia.org/r/902064

gerritbot added a project: Patch-For-Review.Mar 22 2023, 12:00 PM

Change 902064 merged by MVernon:

[operations/puppet@production] Provision the revised Swift dashboard

https://gerrit.wikimedia.org/r/902064

Maintenance_bot removed a project: Patch-For-Review.Mar 22 2023, 1:10 PM

Hi, Sorry, but it happened again while uploading https://commons.wikimedia.org/wiki/File:Dostoevsky_-_The_Gambler_and_Other_Stories,_Collected_Edition,_1917.pdf with chunked uploads.

01217: 27/38> in progress Upload: 100%
01271: FAILED: internal_api_error_UploadChunkFileException: [359de628-b095-48ac-af48-2db85dcebc18] Caught exception of type UploadChunkFileException

Chrome console:

index.php?action=raw&ctype=text/javascript&title=User:Rillke/MwJSBot.js&_=2:1136 mwbot> stashfailed: An unknown error occurred in storage backend "local-swift-eqiad". {error: {…}, servedby: 'mw2324'} success {readyState: 4, getResponseHeader: ƒ, getAllResponseHeaders: ƒ, setRequestHeader: ƒ, overrideMimeType: ƒ, …}

There was no isolation or resolution of root causes, so we can expect the issue to recur periodically.

I would love to see the HTTP error response body. FileOperation logs show 502 errors, but the Swift proxy access logs do not. I tried to capture 502 error responses coming out of swift-proxy on ms-fe2*, but I didn't find any. I found some 500s, forwarded from Thumbor, but not 502. So maybe the 502s are generated by nginx. Access logging is disabled in nginx.

Mentioned in SAL (#wikimedia-operations) [2023-04-05T04:17:22Z] <TimStarling> restarted swift-proxy on ms-fe* T328872

Restarting the proxy servers temporarily fixed it again. The restart caused a doubling of the request rate on ms-fe1010, and CPU usage increased, indicating that that server was at fault. It reminds me of a similar problem we had years ago.

No sign of errors on the dashboard, except the 5xx rate reported by ATS.

Suggested action items:

Install a prometheus collector for nginx
@aaron In SwiftFileBackend, could we log the response headers of an HTTP error response? Maybe some sort of truncated body if there is one?
Alert when the ATS->Swift 5xx rate increases above a certain value.
Daily restarts of swift-proxy?

It's likely that the issue will sporadically continue until it is fully isolated inside swift-proxy and properly fixed.

thcipriani moved this task from Untriaged to Feb 2023 on the Wikimedia-production-error board.Apr 13 2023, 3:53 PM

Headers and body could be logged for 5XXs easily enough in the Swift backend.

I also get many 5xx and stashfailed errors on my current uploads using python requests library.

{'error': {'code': 'stashfailed', 'info': 'An unknown error occurred in storage backend "local-swift-codfw".', '*': 'See https://commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/&gt; for notice of API deprecations and breaking changes.'}, 'servedby': 'mw1388'}

Seems also to be visible on the Grafana board: https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?orgId=1&viewPanel=37&from=now-2d&to=now-1m

Different message on later upload after many 5xx responses and some successful uploads:

{'error': {'code': 'uploadstash-exception', 'info': 'Could not store upload in the stash (UploadStashFileException): "Im Speicher-Backend „local-swift-codfw“ ist ein unbekannter Fehler aufgetreten.".', '*': 'See https://commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/&gt; for notice of API deprecations and breaking changes.'}, 'servedby': 'mw1406'}

I restarted the swift frontends this morning, and the 5xx rate has returned to normal.

Shouldn't we have an automatic alert on this many 5xx's ?

We do monitor and graph the 5xx rate, but it was only ever a tiny fraction of the requests swift was serving (~0.05%).

Xover mentioned this in T337649: Thumbor fails to render thumbnails of djvu/tiff/pdf files quite often in eqiad.Jun 4 2023, 10:44 AM

Still happening. "The MediaWiki error backend-fail-internal occured: An unknown error occurred in storage backend "local-swift-eqiad"."

Novem_Linguae added a project: MediaWiki-Uploading.Jul 1 2023, 8:55 AM

Change 936368 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/core@master] filebackend: Include truncated http body for 502 on SwiftFileBackend

https://gerrit.wikimedia.org/r/936368

gerritbot added a project: Patch-For-Review.Jul 8 2023, 9:58 PM

Krinkle added a project: Unstewarded-production-error.Aug 22 2023, 3:45 PM

Change 936368 merged by jenkins-bot:

[mediawiki/core@master] filebackend: Include truncated http body for 502 on SwiftFileBackend

https://gerrit.wikimedia.org/r/936368

Maintenance_bot removed a project: Patch-For-Review.Aug 22 2023, 4:10 PM

ReleaseTaggerBot added a project: MW-1.41-notes (1.41.0-wmf.25; 2023-09-05).Aug 30 2023, 12:01 AM

There is a (small) spike in grafana about 502 errors (and about 504 errors), see https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?orgId=1&viewPanel=37&from=1693494000000&to=1693501200000

Maybe new logs with the added extra information are already collected.

I think this is not linked to the error rate but it makes the need for a fix more urgent. The Wiki Loves Monuments contests started today.

Yesterday, while uploading a small SVG file using bigChunkedUpload.js, the finalize stage took an unusually long time for such a small file and then logged 00019: FAILED: backend-fail-internal: An unknown error occurred in storage backend "local-swift-codfw". The file did seem to finish upload as expected despite this though.

Today, reuploading (overwriting) a small PNG, the finalize stage took a similarly unexpectedly long time, but no error message was logged. Afterwards, however, the thumbnails for the file semi-randomly fail to update. Main thumb on file description page doesn't update; mini-thumb in the upload history list does update; requesting the image thumb at any size using image markup from a client wiki (enWS) returns the old thumb but squeezed into the new narrower width. Various voodoo poking (action=purge etc.) have failed to unwedge it.

These are new after a relatively long period of no real problems experienced with uploads. Timing wise it could roughly correspond with the spiking error rates @Umherirrender notes above, but as these were my only two uploads in the time period that association is very weak at best.

I restarted the swift frontends, and that has returned the error rate to the usual low levels.

This happens again repeatedly with not so big files (around 100 MB):

Chrome console:

mwbot> internal_api_error_UploadChunkFileException: [fab58ddb-22c9-4677-adfc-786416bf388a] Caught exception of type UploadChunkFileException  {"error":{"code":"internal_api_error_UploadChunkFileException","info":"[fab58ddb-22c9-4677-adfc-786416bf388a] Caught exception of type UploadChunkFileException","errorclass":"UploadChunkFileException"},"servedby":"mw2437"}

while uploading https://commons.wikimedia.org/wiki/File:Swiss_national_map,_39_Fl%C3%BCelapass.jpg

I've spent some time checking, and nothing that looks like that filename appears in the swift proxy logs today (and the error rates look as low as normal in general).

Now I get

00411: FAILED: stashfailed: An unknown error occurred in storage backend "local-swift-eqiad"

[I'm afraid my previous comment still applies - nothing relating to a file named like this appears in any of the swift frontend logs]

Four files repeatedly failed to upload today:

It seems some of these files are available in Special:UploadStash, but I get T348733 while trying to publish them.

Hi, just wanted to let you guys know I'm getting this exact same error trying to upload a new revision to https://commons.wikimedia.org/wiki/File:The_Sea_Beast_(1926).webm, file size ~ 385 mb

Specific error: 00172: 44/84> FAILED: internal_api_error_UploadChunkFileException: [c7e1fd5a-e9e6-4d21-8a84-e0cb0d31d9f7] Caught exception of type UploadChunkFileException

MatthewVernon added a project: MediaWiki-File-management.Oct 13 2023, 7:51 AM

On https://commons.wikimedia.org/wiki/File:Swiss_national_map,_42_Oberwallis.jpg I still get for every try

00293: FAILED: internal_api_error_UploadChunkFileException: [c995f30e-54b1-490d-b829-ec4c4924acc0] Caught exception of type UploadChunkFileException

In T328872#9249487, @Yann wrote:
On https://commons.wikimedia.org/wiki/File:Swiss_national_map,_42_Oberwallis.jpg I still get for every try
00293: FAILED: internal_api_error_UploadChunkFileException: [c995f30e-54b1-490d-b829-ec4c4924acc0] Caught exception of type UploadChunkFileException

Oct 13, 2023 @ 11:19:46.595 ERROR exception mw2295 commonswiki [c995f30e-54b1-490d-b829-ec4c4924acc0] /w/api.php UploadChunkFileException: Error storing file in '/tmp/phpUeTKDd': backend-fail-internal; local-swift-eqiad
Oct 13, 2023 @ 11:19:46.594 ERROR FileOperation mw2295 commonswiki HTTP 401 (Unauthorized) in 'SwiftFileBackend::doStoreInternal' (given '{"async":false,"src":"/tmp/phpUeTKDd","dst":"mwstore://local-swift-eqiad/local-temp/9/90/1af4kf8yow0c.in02i2.1.jpg.13","overwrite":true,"headers":[]}')
Oct 13, 2023 @ 11:19:46.594 ERROR FileOperation mw2295 commonswiki StoreFileOp failed: {"src":"/tmp/phpUeTKDd","dst":"mwstore://local-swift-eqiad/local-temp/9/90/1af4kf8yow0c.in02i2.1.jpg.13","overwrite":true,"headers":[],"failedAction":"attempt"}

So in this case a 401. That's pretty interesting.

In T328872#9249624, @TheDJ wrote:
In T328872#9249487, @Yann wrote:
On https://commons.wikimedia.org/wiki/File:Swiss_national_map,_42_Oberwallis.jpg I still get for every try
00293: FAILED: internal_api_error_UploadChunkFileException: [c995f30e-54b1-490d-b829-ec4c4924acc0] Caught exception of type UploadChunkFileException
Oct 13, 2023 @ 11:19:46.595 ERROR exception mw2295 commonswiki [c995f30e-54b1-490d-b829-ec4c4924acc0] /w/api.php UploadChunkFileException: Error storing file in '/tmp/phpUeTKDd': backend-fail-internal; local-swift-eqiad
Oct 13, 2023 @ 11:19:46.594 ERROR FileOperation mw2295 commonswiki HTTP 401 (Unauthorized) in 'SwiftFileBackend::doStoreInternal' (given '{"async":false,"src":"/tmp/phpUeTKDd","dst":"mwstore://local-swift-eqiad/local-temp/9/90/1af4kf8yow0c.in02i2.1.jpg.13","overwrite":true,"headers":[]}')
Oct 13, 2023 @ 11:19:46.594 ERROR FileOperation mw2295 commonswiki StoreFileOp failed: {"src":"/tmp/phpUeTKDd","dst":"mwstore://local-swift-eqiad/local-temp/9/90/1af4kf8yow0c.in02i2.1.jpg.13","overwrite":true,"headers":[],"failedAction":"attempt"}

So in this case a 401. That's pretty interesting.

As SwiftFileBackend::doStoreInternal is mention in the error, this case seems also handled in T206252: Spike of HTTP errors from SwiftFileBackend::doStoreInternal

For the 401 there are information under T228292#7490101 (Where the task description shows the same stack trace as this task description)

I'm doing a bulk upload from LIghtroom this eve and am running into "The MediaWiki error backend-fail-internal occured: An unknown error occurred in storage backend "local-swift-eqiad"." again.

TheresNoTime mentioned this in T348937: Some or all of the undeletion failed.Oct 15 2023, 9:40 PM

While I managed to upload the files mentioned on Wed, Oct 11, 7:28 PM after many attempts, I still get this error, now with

with the following messages, respectively:

00156: FAILED: internal_api_error_UploadChunkFileException: [0fa38f19-d4b2-49d4-b858-134dc09a519a] Caught exception of type UploadChunkFileException

00213: FAILED: internal_api_error_UploadChunkFileException: [500587b3-7b15-4621-b8a1-8416b1110235] Caught exception of type UploadChunkFileException

00318: FAILED: stashfailed: An unknown error occurred in storage backend "local-swift-eqiad".

Yann mentioned this in T341007: An unknown error occurred in storage backend "local-swift-eqiad".Oct 16 2023, 10:59 AM

Aklapper merged a task: T349127: Large number of fails.Oct 17 2023, 6:18 PM

Aklapper added a subscriber: BeckenhamBear.

Since the cause is not the same old failure of the proxy servers, maybe it would improve clarity if we discussed this incident on a separate task.

What I'm seeing is a total cessation of successful delete operations and a flood of MW log messages showing 401 status codes for various actions.

In T328872#9263825, @tstarling wrote:

Since the cause is not the same old failure of the proxy servers, maybe it would improve clarity if we discussed this incident on a separate task.

What I'm seeing is a total cessation of successful delete operations and a flood of MW log messages showing 401 status codes for various actions.

I don't understand what you mean. I can't say if this is related to T349127, but I got this error and T341007 in quick succession while uploading files. Now I managed to upload the files mentioned above after ~10 failed attempts. I can't say if this issue is fixed, or if I just got lucky.

I wonder if the auth token just expired while the combined file was being uploaded (but after the PUT started). By the time the chunk deletions are issued, perhaps the token expired. AFAIK, the tokens should last for a day and MediaWiki app servers only reuse them for ~15 minutes (which used to be higher, but they seemed to expire too soon). It would be interesting to see the memcached stats for the token storage on the swift servers (maybe there is eviction pressure for some reason, like some service never reusing tokens and thus flooding memcached).

You can see metrics about swift's memcached usage in grafana e.g.; I've previously looked to see if incidents (i.e. raised 5xx levels) correlated with events in memcached, but haven't found anything. There is one very hot memcached key (I think it's account_info cf headers_to_account_info in the swift source code); this is probably not ideal, but I'm not sure if it's a swift bug or just the way the system is expected to work (and I've no evidence to suggest that this is causing a problem).

aaron added a project: API Platform.Oct 24 2023, 4:38 PM

index.php?action=raw&ctype=text/javascript&title=User:Rillke/MwJSBot.js&_=2:1136 mwbot> stashfailed: An unknown error occurred in storage backend "local-swift-codfw".

while uploading https://commons.wikimedia.org/wiki/File:Kaibab_trail_at_Grand_Canfon_National_Park.jpg

Soda merged a task: T350455: Unable to upload a specific file via UploadWizard.Nov 3 2023, 10:02 AM

Soda subscribed.

Ammarpad merged a task: T350455: Unable to upload a specific file via UploadWizard.Nov 3 2023, 10:02 AM

Umherirrender unsubscribed.Feb 11 2024, 12:17 AM

I am currently again experiencing lots of different error messages during file upload.

{'error': {'code': 'backend-fail-internal', 'info': 'An unknown error occurred in storage backend "local-swift-eqiad".', 'stasherrors': [{'message': 'uploadstash-exception', 'params': ['UploadStashBadPathException', 'Der Pfad ist nicht vorhanden.'], 'code': 'uploadstash-exception', 'type': 'error'}], '*': 'See https://commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/&gt; for notice of API deprecations and breaking changes.'}, 'servedby': 'mw-api-ext.eqiad.main-6464cf4f7-qkxkk'}

{'error': {'code': 'backend-fail-internal', 'info': 'An unknown error occurred in storage backend "local-swift-eqiad".', 'stasherrors': [{'message': 'uploadstash-exception', 'params': ['UploadStashBadPathException', 'Der Pfad ist nicht vorhanden.'], 'code': 'uploadstash-exception', 'type': 'error'}], '*': 'See https://commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/&gt; for notice of API deprecations and breaking changes.'}, 'servedby': 'mw-api-ext.eqiad.main-6464cf4f7-gb7xw'}

{'error': {'code': 'uploadstash-bad-path', 'info': "Path doesn't exist.", '*': 'See https://commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/&gt; for notice of API deprecations and breaking changes.'}, 'servedby': 'mw-api-ext.eqiad.main-6464cf4f7-cm6g8'}

{'error': {'code': 'uploadstash-exception', 'info': 'Could not store upload in the stash (UploadStashFileException): "Im Speicher-Backend „local-swift-eqiad“ ist ein unbekannter Fehler aufgetreten.".', '*': 'See https://commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/&gt; for notice of API deprecations and breaking changes.'}, 'servedby': 'mw-api-ext.eqiad.main-6464cf4f7-8ptjp'}

Also getting errors while uploading:

The MediaWiki error backend-fail-stat occured: Could not read the status of file "mwstore://local-multiwrite/local-public/archive/9/96/20240705164303!At_Swindon_Steam_Railway_Museum_2024_176.jpg".
The MediaWiki error backend-fail-stat occured: Could not read the status of file "mwstore://local-multiwrite/local-public/archive/3/3a/20240705173523!At_Swindon_Steam_Railway_Museum_2024_206.jpg".

https://logstash.wikimedia.org/app/dashboards#/view/AXFV7JE83bOlOASGccsT?_g=h@bbef750&_a=h@2ee6f1d
https://grafana-rw.wikimedia.org/d/OPgmB1Eiz/swift?forceLogin=&from=1719294248878&orgId=1&to=1720203381602&var-site=All&viewPanel=37

Big rise in 503s over the last days, and really taking off today.
Created T369388 for this specific occurrence of the problem

TheDJ mentioned this in T369388: Upload errors due to swift failures, 503s.Jul 5 2024, 6:38 PM

Uploads have been going much better today, thank you to whoever fixed this. :-)

TheDJ added a subtask: T360913: Swift proxy server misbehaviour (no longer calling `accept`?).Aug 1 2024, 5:56 PM

	F36912027: UploadChunkFileException frequency.png
	Mar 15 2023, 5:23 AM

	F36909525: image.png
	Mar 13 2023, 4:56 PM

Commons: UploadChunkFileException: Error storing file: backend-fail-internal; local-swift-codfwOpen, Needs TriagePublicPRODUCTION ERRORActions

Description

Details

Related ObjectsSearch...

Event Timeline

Commons: UploadChunkFileException: Error storing file: backend-fail-internal; local-swift-codfw
Open, Needs TriagePublicPRODUCTION ERROR
Actions

Related Objects
Search...