Page MenuHomePhabricator

Increase transcode background time limit
Closed, ResolvedPublic


Most big transcodes (720p, 1080p and long videos) now fail due to transcode background time limit in TMH ($wgTranscodeBackgroundTimeLimit) set to 8h.
I think they should pass with a longer timeout (16h or 24h).

Examples of failed transcodes from : 2 h 5 min 41 s, 1,280 × 720 (2.43 GB) -> WebM 360P, 480P, and 720p failed 46 min 11 s, 1,920 × 1,080 (524.17 MB) -> WebM 1080P failed 2 h 43 min 39 s, 1,920 × 1,080 (3.11 GB) -> WebM 480P, 720p and 1080p failed 2 h 21 min 37 s, 1,200 × 720 (1.14 GB) -> WebM 480P, and 720p failed

Event Timeline

Yann created this task.Jan 19 2017, 5:35 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 19 2017, 5:35 PM
Yann renamed this task from Increase background time limit to Increase transcode background time limit.Jan 19 2017, 5:36 PM
Yann updated the task description. (Show Details)Jan 19 2017, 5:38 PM
brion added a subscriber: brion.Jan 19 2017, 6:04 PM

I was a bit baffled why the timeouts seem to be happening significantly before the 8-hour limit is hit, but it turns out ulimit is based on *CPU time* not *wall-clock time*. Since there is some parallelization between decode, scaling, and re-encoding, the CPU usage is around 175% on these ffmpeg processes, not a 'mere' 100%, so we'll hit an 8 hour limit in 4-6 hours.

Currently just cutting these processes off is wasteful as we lose the entire encoding thime that did happen, so recommend bumping up to match the actual wall-clock time.

Change 333035 had a related patch set uploaded (by Brion VIBBER):
Double $wgTranscodeBackgroundTimeLimit to compensate for threading

brion claimed this task.Jan 19 2017, 6:13 PM

(patch in the works to double the timeout based on our threading setting)

Change 333035 merged by jenkins-bot:
Double $wgTranscodeBackgroundTimeLimit to compensate for threading

brion closed this task as Resolved.Jan 19 2017, 7:23 PM

Ok, this is merged live in today's SWAT updates. Already-running jobs will still have the lower limit and may still time out, but those that start from now should have a doubled time limit which'll be more in line with wall-clock time and should avoid timing out on most of the 1-2 hours 720p/1080p videos.

Yann added a comment.EditedJan 22 2017, 10:52 AM

Now there are a few transcodes with time over 8h, but some are still failing:,_1936.webm
Exitcode: 137
startwork = 20170121151803, error = 20170121235548
8 hours