Page MenuHomePhabricator

/usr/bin/timeout causing ffmpeg to hang indefinitely when running video transcoding jobs
Closed, ResolvedPublic

Description

Running webVideoTranscode jobs manually to transcode a video, with the following command: php maintenance/runJobs.php --type webVideoTranscode, I notice it hangs forever.

This is the output of the ps command on another shell:

jesus@wikidex22:~$ ps aux |grep ffmpeg
php-fpm+ 31120  0.0  0.0   4288   808 pts/2    SN+  20:00   0:00 sh -c /bin/bash '/home/www/www.wikidex.net/mediawiki-1.30.1/includes/shell/limit.sh' ''\''/usr/bin/ffmpeg'\'' -y -i '\''/data2/mwuploads/public/wikidex/9/95/Pikapicado_LGPE
.webm'\'' -threads 1 -skip_threshold 0 -bufsize 6000k -rc_init_occupancy 4000 -qmin 1 -qmax 51 -vb '\''2048000'\'' -vcodec libvpx -slices '\''4'\'' -f webm -s 1280x720 -aq '\''3'\'' -acodec libvorbis  /var/tmp/php-fpm-wikidex/transcode_7
20p.webm965400c454a6.webm 2>&1' 'MW_INCLUDE_STDERR=;MW_CPU_LIMIT=28800; MW_CGROUP='\'''\''; MW_MEM_LIMIT=4194304; MW_FILE_SIZE_LIMIT=3145728; MW_WALL_CLOCK_LIMIT=28800; MW_USE_LOG_PIPE=yes'
php-fpm+ 31121  0.0  0.0  12944  2772 pts/2    SN+  20:00   0:00 /bin/bash /home/www/www.wikidex.net/mediawiki-1.30.1/includes/shell/limit.sh '/usr/bin/ffmpeg' -y -i '/data2/mwuploads/public/wikidex/9/95/Pikapicado_LGPE.webm' -threads 1
-skip_threshold 0 -bufsize 6000k -rc_init_occupancy 4000 -qmin 1 -qmax 51 -vb '2048000' -vcodec libvpx -slices '4' -f webm -s 1280x720 -aq '3' -acodec libvorbis  /var/tmp/php-fpm-wikidex/transcode_720p.webm965400c454a6.webm 2>&1 MW_INCLU
DE_STDERR=;MW_CPU_LIMIT=28800; MW_CGROUP=''; MW_MEM_LIMIT=4194304; MW_FILE_SIZE_LIMIT=3145728; MW_WALL_CLOCK_LIMIT=28800; MW_USE_LOG_PIPE=yes
php-fpm+ 31122  0.0  0.0  11840   796 pts/2    SN   20:00   0:00 /usr/bin/timeout 28800 /bin/bash -c '/usr/bin/ffmpeg' -y -i '/data2/mwuploads/public/wikidex/9/95/Pikapicado_LGPE.webm' -threads 1 -skip_threshold 0 -bufsize 6000k -rc_init
_occupancy 4000 -qmin 1 -qmax 51 -vb '2048000' -vcodec libvpx -slices '4' -f webm -s 1280x720 -aq '3' -acodec libvorbis  /var/tmp/php-fpm-wikidex/transcode_720p.webm965400c454a6.webm 2>&1
php-fpm+ 31123  0.0  0.0  12920  2768 pts/2    TN   20:00   0:00 /bin/bash -c '/usr/bin/ffmpeg' -y -i '/data2/mwuploads/public/wikidex/9/95/Pikapicado_LGPE.webm' -threads 1 -skip_threshold 0 -bufsize 6000k -rc_init_occupancy 4000 -qmin 1
 -qmax 51 -vb '2048000' -vcodec libvpx -slices '4' -f webm -s 1280x720 -aq '3' -acodec libvorbis  /var/tmp/php-fpm-wikidex/transcode_720p.webm965400c454a6.webm 2>&1
php-fpm+ 31124  0.9  0.2 503724 32596 pts/2    TN   20:00   0:00 /usr/bin/ffmpeg -y -i /data2/mwuploads/public/wikidex/9/95/Pikapicado_LGPE.webm -threads 1 -skip_threshold 0 -bufsize 6000k -rc_init_occupancy 4000 -qmin 1 -qmax 51 -vb 204
8000 -vcodec libvpx -slices 4 -f webm -s 1280x720 -aq 3 -acodec libvorbis /var/tmp/php-fpm-wikidex/transcode_720p.webm965400c454a6.webm
jesus    31207  0.0  0.0  12784  1000 pts/1    S+   20:01   0:00 grep ffmpeg

I can reproduce the problem executing the command directly from the command line. The only output I get is the following:

ffmpeg version 3.2.9-1~deb9u1 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 6.3.0 (Debian 6.3.0-18) 20170516
  configuration: --prefix=/usr --extra-version='1~deb9u1' --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnut
ls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libebur128 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgs
m --enable-libmp3lame --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-
libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-
libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
  libavutil      55. 34.101 / 55. 34.101
  libavcodec     57. 64.101 / 57. 64.101
  libavformat    57. 56.101 / 57. 56.101
  libavdevice    57.  1.100 / 57.  1.100
  libavfilter     6. 65.100 /  6. 65.100
  libavresample   3.  1.  0 /  3.  1.  0
  libswscale      4.  2.100 /  4.  2.100                                                                                                                                                                                                       libswresample   2.  3.100 /  2.  3.100
  libpostproc    54.  1.100 / 54.  1.100

Then the shell freezes forever. CTRL+C doesn't work. I have to kill -9 (a simple kill does nothing) the 2 last processes.

After fiddling around with it, the problem is basically the MW_WALL_CLOCK_LIMIT=28800 parameter. If I set this to 0, then /usr/bin/timeout don't get invoked, and the encoding works flawlessly.

My server is Debian Stretch 4.9.0-4-amd64 #1 SMP Debian 4.9.51-1 (2017-09-28) x86_64 GNU/Linux

I've "solved" the problem by setting $wgTranscodeBackgroundTimeLimit = 0; in LocalSettings.php which effectively sets MW_WALL_CLOCK_LIMIT to 0 on the job run.

Event Timeline

There's some irony in setting a timeout causes it to hang forever, and removing the timeout causes it to finish. I assume this occurred on multiple files. Did you try running the same shell command manually with and without timeout to see what happened?

It happens with all webm files. If I edit the includes/shell/limit.sh file and change this line:

/usr/bin/timeout $MW_WALL_CLOCK_LIMIT /bin/bash -c "$1" 3>&-

with this one:

/bin/bash -c "$1" 3>&-

the transcode job finishes without problems.

Manually executing it, when it hangs, I only get the output described on the first message.

I was able to reproduce this. It stops with SIGTTOU when it calls tcsetattr() on stdin. gdb identifies the caller:

if (!run_as_daemon && stdin_interaction) {
    struct termios tty;
    if (tcgetattr (0, &tty) == 0) {
        oldtty = tty;
        restore_tty = 1;

        tty.c_iflag &= ~(IGNBRK|BRKINT|PARMRK|ISTRIP
                         |INLCR|IGNCR|ICRNL|IXON);
        tty.c_oflag |= OPOST;
        tty.c_lflag &= ~(ECHO|ECHONL|ICANON|IEXTEN);
        tty.c_cflag &= ~(CSIZE|PARENB);
        tty.c_cflag |= CS8;
        tty.c_cc[VMIN] = 1;
        tty.c_cc[VTIME] = 0;

        tcsetattr (0, TCSANOW, &tty);

Shell::command() passes through stdin by default, which is not compatible with timeout.

FFmpeg is writing "Press [q] to stop, [?] for help". Setting the timeout to zero will stop it from hanging, but still allows the user to stop FFmpeg by pressing "q", which is presumably unexpected. I was able to truncate a 20s video at 3s by pressing "q" while running the job queue.

Stopping with SIGTTOU is unusual, but the process will also stop with SIGTTIN if anything tries to read from stdin:

$ php maintenance/eval.php
> wfShellExec('cat')
^D^D^D
strace: Process 86676 attached
--- stopped by SIGTTIN ---

cat is eventually killed by timeout with SIGTERM, I guess because it doesn't install a SIGTERM handler. But it's certainly impossible to read from stdin when wfShellExec() is used with a timeout.

Change 621466 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] Don't pass stdin to commands by default

https://gerrit.wikimedia.org/r/621466

ffmpeg has an option -nostdin .. maybe we should add that to our command line spec as well ?

Change 621580 had a related patch set uploaded (by TheDJ; owner: TheDJ):
[mediawiki/extensions/TimedMediaHandler@master] Disable interactive mode of ffmpeg with -nostdin

https://gerrit.wikimedia.org/r/621580

Change 621580 merged by jenkins-bot:
[mediawiki/extensions/TimedMediaHandler@master] Disable interactive mode of ffmpeg with -nostdin

https://gerrit.wikimedia.org/r/621580

Change 622206 had a related patch set uploaded (by TheDJ; owner: TheDJ):
[mediawiki/extensions/TimedMediaHandler@REL1_35] Disable interactive mode of ffmpeg with -nostdin

https://gerrit.wikimedia.org/r/622206

Change 622206 merged by jenkins-bot:
[mediawiki/extensions/TimedMediaHandler@REL1_35] Disable interactive mode of ffmpeg with -nostdin

https://gerrit.wikimedia.org/r/622206

Change 621466 merged by jenkins-bot:
[mediawiki/core@master] Don't pass stdin to commands by default

https://gerrit.wikimedia.org/r/621466

TheDJ claimed this task.

Considering this resolved, with the 2020 patches