Page MenuHomePhabricator

TimedMediaHandler's ffmpeg processes get stuck when using resource limits on Docker image
Closed, ResolvedPublic

Description

On the Docker image, TimedMediaHandler's ffmpeg background processes get stuck when run with resource limits applied, and never complete.

Steps to reproduce:

  • follow the steps & workaround for T246935 (install TMH, upload a WebM video, then run maintenance/runJobs.php)

Actual results:

  • Job runner gets stuck on the first transcode, and the ffmpeg process and its running bash shell never exit.

Expected results:

  • All jobs should run to completion and clean up.

Workaround:

Disabling all the resource limits seems to get it working:

$wgMaxShellMemory = 0;
$wgMaxShellFileSize = 0;
$wgMaxShellTime = 0;
$wgMaxShellWallClockTime = 0;

$wgTranscodeBackgroundTimeLimit = 0;
$wgTranscodeBackgroundMemoryLimit = 0;
$wgTranscodeBackgroundSizeLimit = 0;

There may be another task in the system with a related problem; I remember coming across one recently but I can't find it at the moment. Eg this is probably not Docker-specific and may not be ffmpeg-specific, but I wasn't having it under MW-Vagrant.

Event Timeline

$wgMaxShellMemory = 0;
$wgMaxShellFileSize = 0;
$wgMaxShellTime = 0;
$wgMaxShellWallClockTime = 0;

$wgTranscodeBackgroundTimeLimit = 0;
$wgTranscodeBackgroundMemoryLimit = 0;
$wgTranscodeBackgroundSizeLimit = 0;

Do we want to set these in the container by default? An alternative is to add a section to https://www.mediawiki.org/wiki/MediaWiki-Docker with notes on TimedMediaHandler, or a subpage (like the one for WikibaseMediaInfo https://www.mediawiki.org/wiki/MediaWiki-Docker/Docker_setup_for_WBMI).

Change 584225 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/core@master] DevelopmentSettings: Disable resource limits

https://gerrit.wikimedia.org/r/584225

Change 584225 merged by jenkins-bot:
[mediawiki/core@master] DevelopmentSettings: Disable resource limits

https://gerrit.wikimedia.org/r/584225

Personally, I'd prefer to set stricter limits in development to catch any performance/limit issues before they hit production. Production usually has bigger datasets and performance issues there can cause outages.

Can't TMH overwrite these limits for relevant blocks of code?

Change 597826 had a related patch set uploaded (by Brennen Bearnes; owner: Brennen Bearnes):
[releng/dev-images@master] add stretch-php72-jobrunner for TimedMediaHandler

https://gerrit.wikimedia.org/r/597826

Change 597844 had a related patch set uploaded (by Brennen Bearnes; owner: Brennen Bearnes):
[mediawiki/core@master] DNM: docker: add mediawiki-jobrunner

https://gerrit.wikimedia.org/r/597844

Change 597826 merged by Jforrester:
[releng/dev-images@master] add jobrunner & tweak settings for TimedMediaHandler

https://gerrit.wikimedia.org/r/597826

Change 597844 merged by jenkins-bot:
[mediawiki/core@master] mediawiki-docker: Add a jobrunner container

https://gerrit.wikimedia.org/r/597844