Page MenuHomePhabricator

Lock wait timeout exceeded in UploadFromChunks::updateChunkStatus
Open, Needs TriagePublic

Description

After T200820#4826332 I tried again without "stash and async" (as recommended on https://commons.wikimedia.org/wiki/User_talk:Rillke/bigChunkedUpload.js#Troubleshooting ), and I got the following

03054: 287/287> Server error 503 after uploading chunk: 
Response: <!DOCTYPE html>
<html lang="en" dir="ltr">
<meta charset="utf-8">
<title>Wikimedia Error</title>
<style>
* { margin: 0; padding: 0; }
body { background: #fff; font: 15px/1.6 sans-serif; color: #333; }
.content { margin: 7% auto 0; padding: 2em 1em 1em; max-width: 640px; }
img { float: left; margin: 0 2em 2em 0; }
a img { border: 0; }
h1 { margin-top: 1em; font-size: 1.2em; }
p { margin: 0.7em 0 1em 0; }
a { color: #0645AD; text-decoration: none; }
a:hover { text-decoration: underline; }
</style>
03054: 287/287> upload in progress Upload: 100%
03078: FAILED: internal_api_error_DBQueryError: [XBZngQpAAEwAACGSPNYAAACO] Caught exception of type Wikimedia\Rdbms\DBQueryError

Is this the same issue?
File in stash is 1698l1a6s3kw.c4ox7h.1.pdf

Event Timeline

[XBZngQpAAEwAACGSPNYAAACO] /w/api.php   Wikimedia\Rdbms\DBQueryError from line 1506 of /srv/mediawiki/php-1.33.0-wmf.8/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
Query: UPDATE  `uploadstash` SET us_status = 'chunks',us_chunk_inx = '286',us_size = '1201531325' WHERE us_key = '1698l1a6s3kw.c4ox7h.1.pdf'
Function: UploadFromChunks::updateChunkStatus
Error: 1205 Lock wait timeout exceeded; try restarting transaction (10.64.48.23)
#0 /srv/mediawiki/php-1.33.0-wmf.8/includes/libs/rdbms/database/Database.php(1476): Wikimedia\Rdbms\Database->makeQueryException(string, integer, string, string)
#1 /srv/mediawiki/php-1.33.0-wmf.8/includes/libs/rdbms/database/Database.php(1236): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#2 /srv/mediawiki/php-1.33.0-wmf.8/includes/libs/rdbms/database/Database.php(2114): Wikimedia\Rdbms\Database->query(string, string)
#3 /srv/mediawiki/php-1.33.0-wmf.8/includes/upload/UploadFromChunks.php(300): Wikimedia\Rdbms\Database->update(string, array, array, string)
#4 /srv/mediawiki/php-1.33.0-wmf.8/includes/upload/UploadFromChunks.php(273): UploadFromChunks->updateChunkStatus()
#5 /srv/mediawiki/php-1.33.0-wmf.8/includes/api/ApiUpload.php(226): UploadFromChunks->addChunk(string, integer, integer)
#6 /srv/mediawiki/php-1.33.0-wmf.8/includes/api/ApiUpload.php(132): ApiUpload->getChunkResult(array)
#7 /srv/mediawiki/php-1.33.0-wmf.8/includes/api/ApiUpload.php(104): ApiUpload->getContextResult()
#8 /srv/mediawiki/php-1.33.0-wmf.8/includes/api/ApiMain.php(1595): ApiUpload->execute()
#9 /srv/mediawiki/php-1.33.0-wmf.8/includes/api/ApiMain.php(531): ApiMain->executeAction()
#10 /srv/mediawiki/php-1.33.0-wmf.8/includes/api/ApiMain.php(502): ApiMain->executeActionWithErrorHandling()
#11 /srv/mediawiki/php-1.33.0-wmf.8/api.php(87): ApiMain->execute()
#12 /srv/mediawiki/w/api.php(3): include(string)
#13 {main}

So there are two separate processes trying to update the same upload status, and one of the locks the row and the other times out.
Not sure how that can happen... maybe the tool does parallel upload of multiple chunks at the same time, but even so, the status update should only happen when the upload of that chunk has finished, so the row should not be locked for long, and a lock timeout takes multiple seconds to happen.

And this only happens on the last chunk, when trying to build the whole file, which is the same as the other 2 bugs related. I don't think that a coincidence.

Tgr renamed this task from internal_api_error_DBQueryError to Lock wait timeout exceeded in UploadFromChunks::updateChunkStatus.Dec 17 2018, 10:50 AM

I could upload the 3 files which caused this bug.

This happened again after uploading a 1.9 GB file. :((

04283: 458/458> in progress Upload: 100%
04411: 458/458> Server error 503 after uploading chunk: 
Response: <!DOCTYPE html>
<html lang="en" dir="ltr">
<meta charset="utf-8">
<title>Wikimedia Error</title>
<style>
* { margin: 0; padding: 0; }
body { background: #fff; font: 15px/1.6 sans-serif; color: #333; }
.content { margin: 7% auto 0; padding: 2em 1em 1em; max-width: 640px; }
img { float: left; margin: 0 2em 2em 0; }
a img { border: 0; }
h1 { margin-top: 1em; font-size: 1.2em; }
p { margin: 0.7em 0 1em 0; }
a { color: #0645AD; text-decoration: none; }
a:hover { text-decoration: underline; }
</style>
04411: 458/458> upload in progress Upload: 99%
04431: FAILED: internal_api_error_DBQueryError: [XD4dAwpAIDYAAB0dQIIAAACY] Caught exception of type Wikimedia\Rdbms\DBQueryError

https://commons.wikimedia.org/wiki/File:L%27Illustration,_v.20,_Jul-Dec_1852.pdf
See stash
16bsomjwzu4s.u4y885.1.pdf
16bss6f5rz7g.617yma.1.pdf

Same error, lock wait timeout in UploadFromChunks::updateChunkStatus (at chunk 457). Maybe we should add some debug logging for when exactly each chunk starts / ends uploading, to see if the client does anything weird there.

This comment was removed by Wilfredor.

Same problem, I cant upload my images because this problem, also with Uploadwizard with images TIFF with more 600Mb of size (TIFF with internal zip compression).

Failed to load resource: the server responded with a status of 404 () https://commons.wikimedia.org/wiki/Special:UploadStash/thumb/16btxukj8dc8.3faxuw.32521.tif/lossy-page1-120px-16btxukj8dc8.3faxuw.32521.tif.jpg?0.6459894966662545

error.png (1×3 px, 804 KB)

The sample file that Im trying to upload is:
https://archive.org/download/SacreCoeurDeJesusByLouisJobin1878/Sacre%20Coeur%20de%20Jesus%20by%20Louis%20Jobin%201878.tif

Also the follow error while uploading via URL:

identify command failed: '/usr/bin/identify' -format "[BEGIN]page=%p\nalpha=%A\nalpha2=%r\nheight=%h\nwidth=%w\ndepth=%z[END]" '/tmp/URLb7638a2c66b1.urlupload_' 2>&1

executing the above command locally I get:

$ '/usr/bin/identify' -format "[BEGIN]page=%p\nalpha=%A\nalpha2=%r\nheight=%h\nwidth=%w\ndepth=%z[END]" Sacre_Coeur_de_Jesus_by_Louis_Jobin_1878.tif
identify: Incompatible type for "RichTIFFIPTC"; tag ignored. `TIFFFetchNormalTag' @ warning/tiff.c/TIFFWarnings/912.
identify: Unknown field with tag 34864 (0x8830) encountered. `TIFFReadCustomDirectory' @ warning/tiff.c/TIFFWarnings/912.
identify: Incompatible type for "FileSource"; tag ignored. `TIFFFetchNormalTag' @ warning/tiff.c/TIFFWarnings/912.
identify: Incompatible type for "SceneType"; tag ignored. `TIFFFetchNormalTag' @ warning/tiff.c/TIFFWarnings/912.
identify: Wrong data type 3 for "GainControl"; tag ignored. `TIFFReadCustomDirectory' @ warning/tiff.c/TIFFWarnings/912.
identify: Unknown field with tag 42033 (0xa431) encountered. `TIFFReadCustomDirectory' @ warning/tiff.c/TIFFWarnings/912.
identify: Unknown field with tag 42034 (0xa432) encountered. `TIFFReadCustomDirectory' @ warning/tiff.c/TIFFWarnings/912.
identify: Unknown field with tag 42036 (0xa434) encountered. `TIFFReadCustomDirectory' @ warning/tiff.c/TIFFWarnings/912.
identify: Incompatible type for "RichTIFFIPTC"; tag ignored. `TIFFFetchNormalTag' @ warning/tiff.c/TIFFWarnings/912.
identify: DistributedPixelCache '127.0.0.1' @ error/distribute-cache.c/ConnectPixelCacheServer/244.
identify: cache resources exhausted `Sacre_Coeur_de_Jesus_by_Louis_Jobin_1878.tif' @ error/cache.c/OpenPixelCache/3945.
identify: memory allocation failed `Sacre_Coeur_de_Jesus_by_Louis_Jobin_1878.tif' @ error/psd.c/ReadPSDLayers/1737.
[BEGIN]page=0
alpha=False
alpha2=DirectClass sRGB
height=8912
width=6160
$ identify --version
Version: ImageMagick 6.9.7-4 Q16 x86_64 20170114 http://www.imagemagick.org
Copyright: © 1999-2017 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Features: Cipher DPC Modules OpenMP
Delegates (built-in): bzlib djvu fftw fontconfig freetype jbig jng jp2 jpeg lcms lqr ltdl lzma openexr pangocairo png tiff wmf x xml zlib

Out of memory?
Any hints what else can we do here?

The above was on a machine with 8GB RAM + 8GB swap. On a machine with 100GB RAM I get:

$ '/usr/bin/identify' -format "[BEGIN]page=%p\nalpha=%A\nalpha2=%r\nheight=%h\nwidth=%w\ndepth=%z[END]" Sacre_Coeur_de_Jesus_by_Louis_Jobin_1878.tif
identify: Sacre_Coeur_de_Jesus_by_Louis_Jobin_1878.tif: wrong data type 7 for "RichTIFFIPTC"; tag ignored. `TIFFReadDirectory' @ warning/tiff.c/TIFFWarnings/768.
identify: Sacre_Coeur_de_Jesus_by_Louis_Jobin_1878.tif: unknown field with tag 34864 (0x8830) encountered. `TIFFReadCustomDirectory' @ warning/tiff.c/TIFFWarnings/768.
identify: Sacre_Coeur_de_Jesus_by_Louis_Jobin_1878.tif: wrong data type 3 for "GainControl"; tag ignored. `TIFFReadCustomDirectory' @ warning/tiff.c/TIFFWarnings/768.
identify: Sacre_Coeur_de_Jesus_by_Louis_Jobin_1878.tif: unknown field with tag 42033 (0xa431) encountered. `TIFFReadCustomDirectory' @ warning/tiff.c/TIFFWarnings/768.
identify: Sacre_Coeur_de_Jesus_by_Louis_Jobin_1878.tif: unknown field with tag 42034 (0xa432) encountered. `TIFFReadCustomDirectory' @ warning/tiff.c/TIFFWarnings/768.
identify: Sacre_Coeur_de_Jesus_by_Louis_Jobin_1878.tif: unknown field with tag 42036 (0xa434) encountered. `TIFFReadCustomDirectory' @ warning/tiff.c/TIFFWarnings/768.
identify: Sacre_Coeur_de_Jesus_by_Louis_Jobin_1878.tif: wrong data type 7 for "RichTIFFIPTC"; tag ignored. `TIFFReadDirectory' @ warning/tiff.c/TIFFWarnings/768.
[BEGIN]page=0
alpha=False
alpha2=DirectClass sRGB
height=8912
width=6160
depth=16[END]
$ echo $?
0
$ identify --version
Version: ImageMagick 6.7.7-10 2014-03-08 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2012 ImageMagick Studio LLC
Features: OpenMP

So this is likely OOM problem.

Let's keep this task to be about the lock wait timeout error, please.

thcipriani subscribed.

Still seen as of 1.37.0-wmf.1, happening at the rate of a few 10s of times per hour (max of 30ish)

normalized_message
[{reqId}] {exception_url}   Wikimedia\Rdbms\DBQueryError: Error 1205: Lock wait timeout exceeded; try restarting transaction (10.64.48.124)
Function: UploadFromChunks::updateChunkStatus
Query: UPDATE  `uploadstash` SET us_status = 'chunks',us_chunk_inx =
exception.trace
from /srv/mediawiki/php-1.37.0-wmf.1/includes/libs/rdbms/database/Database.php(1738)
#0 /srv/mediawiki/php-1.37.0-wmf.1/includes/libs/rdbms/database/Database.php(1722): Wikimedia\Rdbms\Database->getQueryException(string, integer, string, string)
#1 /srv/mediawiki/php-1.37.0-wmf.1/includes/libs/rdbms/database/Database.php(1697): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)
#2 /srv/mediawiki/php-1.37.0-wmf.1/includes/libs/rdbms/database/Database.php(1260): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#3 /srv/mediawiki/php-1.37.0-wmf.1/includes/libs/rdbms/database/Database.php(2498): Wikimedia\Rdbms\Database->query(string, string, integer)
#4 /srv/mediawiki/php-1.37.0-wmf.1/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->update(string, array, string, string)
#5 /srv/mediawiki/php-1.37.0-wmf.1/includes/libs/rdbms/database/DBConnRef.php(375): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#6 /srv/mediawiki/php-1.37.0-wmf.1/includes/upload/UploadFromChunks.php(289): Wikimedia\Rdbms\DBConnRef->update(string, array, array, string)
#7 /srv/mediawiki/php-1.37.0-wmf.1/includes/upload/UploadFromChunks.php(263): UploadFromChunks->updateChunkStatus()
#8 /srv/mediawiki/php-1.37.0-wmf.1/includes/api/ApiUpload.php(251): UploadFromChunks->addChunk(string, integer, integer)
#9 /srv/mediawiki/php-1.37.0-wmf.1/includes/api/ApiUpload.php(130): ApiUpload->getChunkResult(array)
#10 /srv/mediawiki/php-1.37.0-wmf.1/includes/api/ApiUpload.php(101): ApiUpload->getContextResult()
#11 /srv/mediawiki/php-1.37.0-wmf.1/includes/api/ApiMain.php(1652): ApiUpload->execute()
#12 /srv/mediawiki/php-1.37.0-wmf.1/includes/api/ApiMain.php(622): ApiMain->executeAction()
#13 /srv/mediawiki/php-1.37.0-wmf.1/includes/api/ApiMain.php(593): ApiMain->executeActionWithErrorHandling()
#14 /srv/mediawiki/php-1.37.0-wmf.1/api.php(90): ApiMain->execute()
#15 /srv/mediawiki/php-1.37.0-wmf.1/api.php(45): wfApiMain()
#16 /srv/mediawiki/w/api.php(3): require(string)
#17 {main}

There is logging in the code (UploadFromChunks::updateChunkStatus), not sure if that can be found for this request:

		wfDebug( __METHOD__ . " update chunk status for {$this->mFileKey} offset:" .
			$this->getOffset() . ' inx:' . $this->getChunkIndex() );

The update on the table is done using uploadstash.us_key, which is not the primary key, but a unique index, which should be fast as the primary key and not resulting in deadlocks.

I have removed a log filter from Kibana MediaWiki New Errors dashboard which pointed back to this task. There is no hit over the last eight weeks.

The filter was T212101 Lock wait timeout exceeded in UploadFromChunks::updateChunkStatus

{
  "query": {
    "match_phrase": {
      "error.message": "Function: UploadFromChunks::updateChunkStatus"
    }
  }
}