Page MenuHomePhabricator

API uploads fatal with UploadChunkFileException: Error storing file in '/tmp' backend-fail-internal
Open, MediumPublicPRODUCTION ERROR

Description

Error

Request URL:
Request ID: INSERT_ID

message
UploadChunkFileException: Error storing file in '/tmp/phpYHAPWZ': backend-fail-internal; local-swift-codfw
trace
#0 /srv/mediawiki/php-1.34.0-wmf.14/includes/upload/UploadFromChunks.php(275): UploadFromChunks->outputChunk(string)
#1 /srv/mediawiki/php-1.34.0-wmf.14/includes/api/ApiUpload.php(226): UploadFromChunks->addChunk(string, integer, integer)
#2 /srv/mediawiki/php-1.34.0-wmf.14/includes/api/ApiUpload.php(132): ApiUpload->getChunkResult(array)
#3 /srv/mediawiki/php-1.34.0-wmf.14/includes/api/ApiUpload.php(104): ApiUpload->getContextResult()
#4 /srv/mediawiki/php-1.34.0-wmf.14/includes/api/ApiMain.php(1583): ApiUpload->execute()
#5 /srv/mediawiki/php-1.34.0-wmf.14/includes/api/ApiMain.php(531): ApiMain->executeAction()
#6 /srv/mediawiki/php-1.34.0-wmf.14/includes/api/ApiMain.php(502): ApiMain->executeActionWithErrorHandling()
#7 /srv/mediawiki/php-1.34.0-wmf.14/api.php(86): ApiMain->execute()
#8 /srv/mediawiki/w/api.php(3): require(string)
#9 {main}

Impact

Unknown. Special:NewFiles still shows new files being uploaded, so at least it’s not preventing all uploads.

Notes

From logstash:

  • New in 1.34-wmf.14.
  • Affects commons.wikimedia.org (naturally).
  • Seen several dozen times already in the short time it's been out.

Event Timeline

LarsWirzenius triaged this task as Unbreak Now! priority.Jul 17 2019, 3:55 PM
Cparle added a subscriber: fgiunchedi.
Cparle added a subscriber: Cparle.

@fgiunchedi I tagged you cos @Gilles is away and I dunno who else to ask about swift ...

:D afaik we've been working almost exclusively on js/ui stuff lately, so I don't think it's us

Adding SRE per SRE-swift-storage / @fgiunchedi

(There's no tag for the Infrastructure Foundations subteam of SRE is there?)

fgiunchedi lowered the priority of this task from Unbreak Now! to Medium.Jul 18 2019, 8:29 AM

The errors from UploadChunkFileException: https://logstash.wikimedia.org/goto/ce40b31903aa613bce0ec93c9934e5f4

Searching for local-swift-codfw on the same time period: https://logstash.wikimedia.org/goto/fe3047f68c4b368079d6845b0dfccbe9

There's a bunch of errors in this form over four minutes

2019-07-17T14:55:39	mw1230	ERROR	HTTP 401 (Unauthorized) in 'SwiftFileBackend::doStoreInternal' (given '{"async":false,"op":"store","src":"/tmp/phpYHAPWZ","dst":"mwstore://local-swift-codfw/local-temp/d/d3/16rd13foxoo4.7etxt1.2927633.jpg.1","headers":[],"overwrite":true}')

Which I believe are due to MW's authentication token to swift expiring, I'm not sure if there's logic to retry and refresh the auth in cases like this though. I doubt it is a newly introduced bug, thus I'm boldly setting priority to normal, not a train blocker IMHO.

@fgiunchedi If this is not blocking the train, please remove the train task from parent tasks.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:06 PM

This is a 1y+ production error still waiting to be investigated. There is some reason to suspect it might be infrastructure related, but before SRE can help here it will first need to be better understood and quantified what goes wrong in Swift (if indeed that's the case).