Frequent job timeouts on HHVM video scalers
Since updating two of the video scalers to HHVM (T104747) and leaving the remaining old one out of rotation, we've seen that things work sometimes but also frequently time out without properly recording the failure.

The jobrunner.log is listing 503 errors from HHVM, which may indicate we're hitting HHVM's generic timeout, likely much too short for the long-running transcode processes. It looks like this kills the job process immediately, without a chance to write an update to the transcode table, so Special:TimedMediaHandler and transcode tables on File: pages still claim they're running.

Need to investigate with joe exactly what's going on and if we can adjust it to handle the longer-running processes better.

This should be a bit better now given I raised a few HHVM timeouts.

@Joe I still see a lot of failures, but now they come with a giant WMF error page:

2015-09-25T19:50:40+0000: Runner loop 0 process in slot 3 gave status '0':
curl -XPOST -s -a ''
	Encoding to codec: vp8
Running cmd: 

'/usr/bin/ffmpeg' -y -i '/tmp/localcopy_4118d5722782-1.webm' -threads 2 -skip_threshold 0 -bufsize 6000k -rc_init_occupancy 4000 -qmin 1 -qmax 51 -vb '1024000' -vcodec libvpx -g '128' -keyint_min '128' -f webm -s 854x480 -an -pass '1' -passlogfile '/tmp/transcode_480p.webm88f646e0f3e8-1.webm.log' /dev/null

Hmm maxtime=60 ? Do we really want that in the URL? :)

@Paladox yes, those are linked on the duplicate bug report.

Ok. but I mean I have re run them and still taking a while.

@Paladox please stop re-running transcodes; it interferes with our ability to track what's going on and fix the problem to have other people resetting things unexpectedly.

Oh sorry I didn't know I shoulden have done that sorry.

@Paladox @brion I think Ori might have found the problem and fixed it: we were setting an override on max_execution_time in mediawiki-config if not running on CLI.

It should go much better from now on, please let me know.

Might be better for some cases, but the Lila Tetrikov file from T113532 still appears to have troubles. Possibly that's another issue and the ticket should be unduped ?

Thankyou @Joe and @ori for fixing the problem.

\o/ Resolving this, and reimaging the remaining videoscaler!

@TheDJ I'll look into that specific bug today.

