Page MenuHomePhabricator

Transcoding job is reported as failed before it's completed
Closed, InvalidPublic

Assigned To
None
Authored By
HebronNor
Dec 10 2019, 7:41 PM
Referenced Files
F31467780: image.png
Dec 10 2019, 7:41 PM
F31467772: image.png
Dec 10 2019, 7:41 PM
F31467776: image.png
Dec 10 2019, 7:41 PM
F31467774: image.png
Dec 10 2019, 7:41 PM

Description

Most transcoding jobs are reported as failed while ffmpeg is still running, this changes to "Completed" when the process is done.

image.png (406×769 px, 73 KB)
image.png (408×765 px, 75 KB)
image.png (453×869 px, 81 KB)

But the transcoding job is still listed as failed in Special:TimedMediaHandler.

image.png (341×536 px, 40 KB)

The failed, then completed, jobs have both time_success and time_error in the database.

MariaDB [wikidb]> select * from transcode;
+--------------+-------------------------------------+----------------+-----------------+-----------------------+--------------------------+------------------------+----------------------+-------------------------+
| transcode_id | transcode_image_name                | transcode_key  | transcode_error | transcode_time_addjob | transcode_time_startwork | transcode_time_success | transcode_time_error | transcode_final_bitrate |
+--------------+-------------------------------------+----------------+-----------------+-----------------------+--------------------------+------------------------+----------------------+-------------------------+
|           79 | VID_20170304_111417.mp4             | 120p.vp9.webm  |                 | 20191210193223        | 20191210193232           | 20191210193348         | 20191210193233       |                  216800 |
|           80 | VID_20170304_111417.mp4             | 180p.vp9.webm  |                 | 20191210193223        | 20191210193348           | 20191210193538         | NULL                 |                  296488 |
|           81 | VID_20170304_111417.mp4             | 240p.vp9.webm  |                 | 20191210193223        | 20191210193538           | 20191210193813         | 20191210193603       |                  416104 |
|           82 | VID_20170304_111417.mp4             | 360p.vp9.webm  |                 | 20191210193223        | 20191210193813           | NULL                   | NULL                 |                       0 |
|           83 | VID_20170304_111417.mp4             | 480p.vp9.webm  |                 | 20191210193223        | NULL                     | NULL                   | NULL                 |                       0 |
|           84 | VID_20170304_111417.mp4             | 720p.vp9.webm  |                 | 20191210193223        | NULL                     | NULL                   | NULL                 |                       0 |
|           85 | VID_20170304_111417.mp4             | 1080p.vp9.webm |                 | 20191210193223        | NULL                     | NULL                   | NULL                 |                       0 |
+--------------+-------------------------------------+----------------+-----------------+-----------------------+--------------------------+------------------------+----------------------+-------------------------+

Versions
MediaWiki 1.33.1
PHP 7.2.24-0ubuntu0.18.04.1 (fpm-fcgi)
TimedMediaHandler 0.6.0

Settings

$wgMaxShellMemory = 614400; // default x 2
$wgMaxShellFileSize = 204800; // default x 2
$wgMaxShellTime = 300;

wfLoadExtension( 'TimedMediaHandler' );
$wgTmhEnableMp4Uploads = true;
$wgTmhWebPlayer = 'videojs';
$wgFFmpegLocation = '/usr/bin/ffmpeg';
$wgTranscodeBackgroundTimeLimit = 0;
$wgFFmpegVP9RowMT = true;
$wgFFmpegThreads = 2;
$wgEnableNiceBackgroundTranscodeJobs = true;

Event Timeline

Looking... it looks like on success transcode_error is cleared but transcode_time_error is not. However it shouldn't be getting set in the first place unless something's awry...

Can you try it with $wgEnableNiceBackgroundTranscodeJobs off? I don't think that mode is well tested currently; shouldn't make a difference but I'm a bit uncertain what happens with it on...

You might need to add a debug log line in WebVideoTranscodeJob::setTranscodeError (or watch the full debug logs for the SQL updates to the transcode table) to check exactly what error it thinks it's putting in there in the first place.

I've tried with the $wgEnableNiceBackgroundTranscodeJobs setting off as well, same result. I enabled it to see if it would resolve the issue, but nothing changed.

Here are the SQL logs for the transcode table, and the video in question:

[DBQuery] SELECT  transcode_key  FROM `transcode`    WHERE transcode_image_name = 'VID_20170304_111417.mp4'
[DBQuery] DELETE FROM `transcode` WHERE transcode_image_name = 'VID_20170304_111417.mp4'
[DBQuery] SELECT  *  FROM `transcode`    WHERE transcode_image_name = 'VID_20170304_111417.mp4'  LIMIT 100

[DBQuery] INSERT IGNORE INTO `transcode` (transcode_image_name,transcode_key,transcode_time_addjob,transcode_error,transcode_final_bitrate) VALUES ('VID_20170304_111417.mp4','120p.vp9.webm','20191210193223','','0')
[DBQuery] SELECT  *  FROM `transcode`    WHERE transcode_image_name = 'VID_20170304_111417.mp4'  LIMIT 100
[DBQuery] INSERT IGNORE INTO `transcode` (transcode_image_name,transcode_key,transcode_time_addjob,transcode_error,transcode_final_bitrate) VALUES ('VID_20170304_111417.mp4','180p.vp9.webm','20191210193223','','0')
[DBQuery] SELECT  *  FROM `transcode`    WHERE transcode_image_name = 'VID_20170304_111417.mp4'  LIMIT 100
[DBQuery] INSERT IGNORE INTO `transcode` (transcode_image_name,transcode_key,transcode_time_addjob,transcode_error,transcode_final_bitrate) VALUES ('VID_20170304_111417.mp4','240p.vp9.webm','20191210193223','','0')
[DBQuery] SELECT  *  FROM `transcode`    WHERE transcode_image_name = 'VID_20170304_111417.mp4'  LIMIT 100
[DBQuery] INSERT IGNORE INTO `transcode` (transcode_image_name,transcode_key,transcode_time_addjob,transcode_error,transcode_final_bitrate) VALUES ('VID_20170304_111417.mp4','360p.vp9.webm','20191210193223','','0')
[DBQuery] SELECT  *  FROM `transcode`    WHERE transcode_image_name = 'VID_20170304_111417.mp4'  LIMIT 100
[DBQuery] INSERT IGNORE INTO `transcode` (transcode_image_name,transcode_key,transcode_time_addjob,transcode_error,transcode_final_bitrate) VALUES ('VID_20170304_111417.mp4','480p.vp9.webm','20191210193223','','0')
[DBQuery] SELECT  *  FROM `transcode`    WHERE transcode_image_name = 'VID_20170304_111417.mp4'  LIMIT 100
[DBQuery] INSERT IGNORE INTO `transcode` (transcode_image_name,transcode_key,transcode_time_addjob,transcode_error,transcode_final_bitrate) VALUES ('VID_20170304_111417.mp4','720p.vp9.webm','20191210193223','','0')
[DBQuery] SELECT  *  FROM `transcode`    WHERE transcode_image_name = 'VID_20170304_111417.mp4'  LIMIT 100
[DBQuery] INSERT IGNORE INTO `transcode` (transcode_image_name,transcode_key,transcode_time_addjob,transcode_error,transcode_final_bitrate) VALUES ('VID_20170304_111417.mp4','1080p.vp9.webm','20191210193223','','0')
[DBQuery] SELECT  *  FROM `transcode`    WHERE transcode_image_name = 'VID_20170304_111417.mp4'  LIMIT 100

[DBQuery] SELECT  transcode_time_startwork  FROM `transcode`    WHERE transcode_image_name = 'VID_20170304_111417.mp4' AND transcode_key = '120p.vp9.webm'  LIMIT 1
[DBQuery] UPDATE  `transcode` SET transcode_time_startwork = '20191210193232' WHERE transcode_image_name = 'VID_20170304_111417.mp4' AND transcode_key = '120p.vp9.webm'
[DBQuery] SELECT  *  FROM `transcode`    WHERE transcode_image_name = 'VID_20170304_111417.mp4'  LIMIT 100

And then something interesting happens:

[DBQuery] SELECT  *  FROM `transcode`    WHERE transcode_image_name = 'VID_20170304_111417.mp4'  LIMIT 100  
[DBPerformance] Expectation (writes <= 0) by MediaWiki::main not met (actual: 1):
query-m: UPDATE `transcode` SET transcode_time_error = 'X' [TRX#666e0f]
#0 /var/www/mediawiki/includes/libs/rdbms/TransactionProfiler.php(245): Wikimedia\Rdbms\TransactionProfiler->reportExpectationViolated('writes', 'query-m: ↪UPDATE...', 1)
#1 /var/www/mediawiki/includes/libs/rdbms/database/Database.php(1343): Wikimedia\Rdbms\TransactionProfiler->recordQueryCompletion('query-m: UPDATE...',    ↪1576006353.8242, true, 1)
#2 /var/www/mediawiki/includes/libs/rdbms/database/Database.php(1224): Wikimedia\Rdbms\Database->attemptQuery('UPDATE  `transc...', 'UPDATE /* WebVi...',  ↪true, 'WebVideoTransco...')
#3 /var/www/mediawiki/includes/libs/rdbms/database/Database.php(2195): Wikimedia\Rdbms\Database->query('UPDATE  `transc...', 'WebVideoTransco...')
#4 /var/www/mediawiki/extensions/TimedMediaHandler/includes/WebVideoTranscode/WebVideoTranscode.php(769): Wikimedia\Rdbms\Database->update('`transcode`',  ↪Array, Array, 'WebVideoTransco...', Array)
#5 /var/www/mediawiki/extensions/TimedMediaHandler/includes/WebVideoTranscode/WebVideoTranscode.php(694): WebVideoTranscode::                              ↪getTranscodeState(Object(LocalFile))
#6 /var/www/mediawiki/extensions/TimedMediaHandler/includes/WebVideoTranscode/WebVideoTranscode.php(875): WebVideoTranscode::                              ↪isTranscodeReady(Object(LocalFile), '120p.vp9.webm')
#7 /var/www/mediawiki/extensions/TimedMediaHandler/includes/WebVideoTranscode/WebVideoTranscode.php(678): WebVideoTranscode::                              ↪addSourceIfReady(Object(LocalFile), Array, '120p.vp9.webm', Array) 
#8 /var/www/mediawiki/extensions/TimedMediaHandler/includes/WebVideoTranscode/WebVideoTranscode.php(568): WebVideoTranscode::                              ↪getLocalSources(Object(LocalFile), Array) 
#9 /var/www/mediawiki/extensions/TimedMediaHandler/includes/TimedMediaTransformOutput.php(560): WebVideoTranscode::getSources(Object(LocalFile))
#10 /var/www/mediawiki/extensions/TimedMediaHandler/includes/TimedMediaTransformOutput.php(344): TimedMediaTransformOutput->getMediaSources()
#11 /var/www/mediawiki/extensions/TimedMediaHandler/includes/TimedMediaTransformOutput.php(181): TimedMediaTransformOutput->getHtmlMediaTagOutput()
#12 /var/www/mediawiki/includes/page/ImagePage.php(437): TimedMediaTransformOutput->toHtml(Array)
#13 /var/www/mediawiki/includes/page/ImagePage.php(136): ImagePage->openShowImage()
#14 /var/www/mediawiki/includes/actions/ViewAction.php(68): ImagePage->view()
#15 /var/www/mediawiki/includes/MediaWiki.php(499): ViewAction->show()
#16 /var/www/mediawiki/includes/MediaWiki.php(294): MediaWiki->performAction(Object(ImagePage), Object(Title))
#17 /var/www/mediawiki/includes/MediaWiki.php(865): MediaWiki->performRequest()
#18 /var/www/mediawiki/includes/MediaWiki.php(515): MediaWiki->main()
#19 /var/www/mediawiki/index.php(42): MediaWiki->run()
#20 {main} 
[DBQuery] UPDATE  `transcode` SET transcode_time_error = '20191210193233',transcode_error = 'timeout' WHERE transcode_image_name = 'VID_20170304_111417.   ↪mp4' AND transcode_key = '120p.vp9.webm'

Hm, interesting. I've been running tail -f | grep trancode_error on the debug file while uploading videos. I uploaded three videos without any errors, but as soon as I navigated to the video page File:xxx.mp4 during a running trancode I got the following error:

4378 [DBQuery] SELECT  *  FROM `transcode`    WHERE transcode_image_name = '2001.-.Max.raning.mp4'  LIMIT 100
4379 [DBPerformance] Expectation (writes <= 0) by MediaWiki::main not met (actual: 1):
4380 query-m: UPDATE `transcode` SET transcode_time_error = 'X' [TRX#cd135b]
4381 #0 /var/www/mediawiki/includes/libs/rdbms/TransactionProfiler.php(245): Wikimedia\Rdbms\TransactionProfiler->reportExpectationViolated('writes',           ↪'query-m: UPDATE...', 1)
4382 #1 /var/www/mediawiki/includes/libs/rdbms/database/Database.php(1343): Wikimedia\Rdbms\TransactionProfiler->recordQueryCompletion('query-m: UPDATE...      ↪', 1576011427.3801, true, 1)
4383 #2 /var/www/mediawiki/includes/libs/rdbms/database/Database.php(1224): Wikimedia\Rdbms\Database->attemptQuery('UPDATE  `transc...', 'UPDATE /*             ↪WebVi...', true, 'WebVideoTransco...')
4384 #3 /var/www/mediawiki/includes/libs/rdbms/database/Database.php(2195): Wikimedia\Rdbms\Database->query('UPDATE  `transc...', 'WebVideoTransco...')
4385 #4 /var/www/mediawiki/extensions/TimedMediaHandler/includes/WebVideoTranscode/WebVideoTranscode.php(769): Wikimedia\Rdbms\Database-                        ↪>update('`transcode`', Array, Array, 'WebVideoTransco...', Array)
4386 #5 /var/www/mediawiki/extensions/TimedMediaHandler/includes/WebVideoTranscode/WebVideoTranscode.php(694): WebVideoTranscode::                              ↪getTranscodeState(Object(LocalFile))
4387 #6 /var/www/mediawiki/extensions/TimedMediaHandler/includes/WebVideoTranscode/WebVideoTranscode.php(875): WebVideoTranscode::                              ↪isTranscodeReady(Object(LocalFile), '120p.vp9.webm')
4388 #7 /var/www/mediawiki/extensions/TimedMediaHandler/includes/WebVideoTranscode/WebVideoTranscode.php(678): WebVideoTranscode::                              ↪addSourceIfReady(Object(LocalFile), Array, '120p.vp9.webm', Array)
4389 #8 /var/www/mediawiki/extensions/TimedMediaHandler/includes/WebVideoTranscode/WebVideoTranscode.php(568): WebVideoTranscode::                              ↪getLocalSources(Object(LocalFile), Array)
4390 #9 /var/www/mediawiki/extensions/TimedMediaHandler/includes/TimedMediaTransformOutput.php(560): WebVideoTranscode::getSources(Object(LocalFile))
4391 #10 /var/www/mediawiki/extensions/TimedMediaHandler/includes/TimedMediaTransformOutput.php(344): TimedMediaTransformOutput->getMediaSources()
4392 #11 /var/www/mediawiki/extensions/TimedMediaHandler/includes/TimedMediaTransformOutput.php(181): TimedMediaTransformOutput->getHtmlMediaTagOutput()
4393 #12 /var/www/mediawiki/includes/page/ImagePage.php(437): TimedMediaTransformOutput->toHtml(Array)
4394 #13 /var/www/mediawiki/includes/page/ImagePage.php(136): ImagePage->openShowImage()
4395 #14 /var/www/mediawiki/includes/actions/ViewAction.php(68): ImagePage->view()
4396 #15 /var/www/mediawiki/includes/MediaWiki.php(499): ViewAction->show()
4397 #16 /var/www/mediawiki/includes/MediaWiki.php(294): MediaWiki->performAction(Object(ImagePage), Object(Title))
4398 #17 /var/www/mediawiki/includes/MediaWiki.php(865): MediaWiki->performRequest()
4399 #18 /var/www/mediawiki/includes/MediaWiki.php(515): MediaWiki->main()
4400 #19 /var/www/mediawiki/index.php(42): MediaWiki->run()
4401 #20 {main}
4402 [DBQuery] UPDATE  `transcode` SET transcode_time_error = '20191210205707',transcode_error = 'timeout' WHERE transcode_image_name = '2001.-.Max.raning.     ↪mp4' AND transcode_key = '120p.vp9.webm'

@brion After running a bunch of transcoding jobs, without browsing the video page (File:xxx.mp4) during the ongoing transcode; I have no errors. So it seems pretty conclusive that accessing the video page during the ongoing transcode is causing the error above.

As a start; can we at least clear transcode_time_error when, or if, the transcode succeeds? Like we do with transcode_error?

I'd like to submit a pull request for this, dipping my toes into contributing here :)

Looks like all that is needed is to add a line here; https://github.com/wikimedia/mediawiki-extensions-TimedMediaHandler/blob/ffd88ea8982c4e27735d70ce6412adddb2700714/includes/WebVideoTranscode/WebVideoTranscodeJob.php#L305

'transcode_time_error' => '',

Any objections?

Thanks for looking at the code. :) Note that the pull request would have to go into Gerrit and not to Github - see the Gerrit Tutorial. Thanks!

Thanks for looking at the code. :) Note that the pull request would have to go into Gerrit and not to Github - see the Gerrit Tutorial. Thanks!

Yes I got that; just made myself a developer account. Thanks for the tutorial. :)

Change 560118 had a related patch set uploaded (by Thomas Jensen; owner: Thomas Jensen):
[mediawiki/extensions/TimedMediaHandler@master] WebVideoTranscodeJob: Clear transcode error timestamp on success

https://gerrit.wikimedia.org/r/560118

Change 560118 merged by jenkins-bot:
[mediawiki/extensions/TimedMediaHandler@master] WebVideoTranscodeJob: Clear transcode error timestamp on success

https://gerrit.wikimedia.org/r/560118

@HebronNor https://github.com/wikimedia/mediawiki-extensions-TimedMediaHandler/blob/19bdb6794c0789a028fab1623e87b16f25990b57/includes/WebVideoTranscode/WebVideoTranscode.php#L745

The stack trace shows that it is triggering an update inside of getTranscodeState() This update should only occur if the the maximum allowed transcode time has passed.
$over = $db->timestamp( time() - ( 2 * $wgTranscodeBackgroundTimeLimit ) );

And interestingly enough, u have $wgTranscodeBackgroundTimeLimit = 0; configured.

I suspect this is a problem with DB transaction management... + $wgTranscodeBackgroundTimeLimit = 0

@TheDJ Thanks for the analysis :) I set that parameter to 0 because I was experiencing the symptoms described in T206957. I might add that I am running my MediaWiki jobs as described here; https://www.mediawiki.org/wiki/Manual:Job_queue if that matters.

I'll dig into this when I find some time. I'll make sure to post any findings here. Happy holidays :)

Closing as no further reports