Page MenuHomePhabricator

API uploads fatal with UploadChunkFileException: Error storing file in '/tmp' backend-fail-internal
Open, MediumPublicPRODUCTION ERROR

Description

Error

Request URL:
Request ID: INSERT_ID

message
UploadChunkFileException: Error storing file in '/tmp/phpYHAPWZ': backend-fail-internal; local-swift-codfw
trace
#0 /srv/mediawiki/php-1.34.0-wmf.14/includes/upload/UploadFromChunks.php(275): UploadFromChunks->outputChunk(string)
#1 /srv/mediawiki/php-1.34.0-wmf.14/includes/api/ApiUpload.php(226): UploadFromChunks->addChunk(string, integer, integer)
#2 /srv/mediawiki/php-1.34.0-wmf.14/includes/api/ApiUpload.php(132): ApiUpload->getChunkResult(array)
#3 /srv/mediawiki/php-1.34.0-wmf.14/includes/api/ApiUpload.php(104): ApiUpload->getContextResult()
#4 /srv/mediawiki/php-1.34.0-wmf.14/includes/api/ApiMain.php(1583): ApiUpload->execute()
#5 /srv/mediawiki/php-1.34.0-wmf.14/includes/api/ApiMain.php(531): ApiMain->executeAction()
#6 /srv/mediawiki/php-1.34.0-wmf.14/includes/api/ApiMain.php(502): ApiMain->executeActionWithErrorHandling()
#7 /srv/mediawiki/php-1.34.0-wmf.14/api.php(86): ApiMain->execute()
#8 /srv/mediawiki/w/api.php(3): require(string)
#9 {main}

Impact

Unknown. Special:NewFiles still shows new files being uploaded, so at least it’s not preventing all uploads.

Notes

From logstash:

  • New in 1.34-wmf.14.
  • Affects commons.wikimedia.org (naturally).
  • Seen several dozen times already in the short time it's been out.

Event Timeline

Krinkle created this task.Jul 17 2019, 3:44 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 17 2019, 3:44 PM
LarsWirzenius triaged this task as Unbreak Now! priority.Jul 17 2019, 3:55 PM
Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptJul 17 2019, 3:55 PM
Cparle added a subscriber: Gilles.Jul 17 2019, 4:11 PM
Cparle added a subscriber: fgiunchedi.
Cparle added a subscriber: Cparle.

@fgiunchedi I tagged you cos @Gilles is away and I dunno who else to ask about swift ...

@MarkTraceur can you take a look here, please?

greg added a comment.Jul 17 2019, 4:35 PM

(oops, thanks @Cparle )

:D afaik we've been working almost exclusively on js/ui stuff lately, so I don't think it's us

Adding Operations per SRE-swift-storage / @fgiunchedi

(There's no tag for the Infrastructure Foundations subteam of SRE is there?)

fgiunchedi lowered the priority of this task from Unbreak Now! to Medium.Jul 18 2019, 8:29 AM

The errors from UploadChunkFileException: https://logstash.wikimedia.org/goto/ce40b31903aa613bce0ec93c9934e5f4

Searching for local-swift-codfw on the same time period: https://logstash.wikimedia.org/goto/fe3047f68c4b368079d6845b0dfccbe9

There's a bunch of errors in this form over four minutes

2019-07-17T14:55:39	mw1230	ERROR	HTTP 401 (Unauthorized) in 'SwiftFileBackend::doStoreInternal' (given '{"async":false,"op":"store","src":"/tmp/phpYHAPWZ","dst":"mwstore://local-swift-codfw/local-temp/d/d3/16rd13foxoo4.7etxt1.2927633.jpg.1","headers":[],"overwrite":true}')

Which I believe are due to MW's authentication token to swift expiring, I'm not sure if there's logic to retry and refresh the auth in cases like this though. I doubt it is a newly introduced bug, thus I'm boldly setting priority to normal, not a train blocker IMHO.

@fgiunchedi If this is not blocking the train, please remove the train task from parent tasks.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:06 PM
Restricted Application added a project: Structured-Data-Backlog. · View Herald TranscriptSep 8 2020, 7:29 PM

This is a 1y+ production error still waiting to be investigated. There is some reason to suspect it might be infrastructure related, but before SRE can help here it will first need to be better understood and quantified what goes wrong in Swift (if indeed that's the case).