Since updating two of the video scalers to HHVM (T104747) and leaving the remaining old one out of rotation, we've seen that things work sometimes but also frequently time out without properly recording the failure.
The jobrunner.log is listing 503 errors from HHVM, which may indicate we're hitting HHVM's generic timeout, likely much too short for the long-running transcode processes. It looks like this kills the job process immediately, without a chance to write an update to the transcode table, so Special:TimedMediaHandler and transcode tables on File: pages still claim they're running.
Need to investigate with joe exactly what's going on and if we can adjust it to handle the longer-running processes better.