Page MenuHomePhabricator

Refactor long-running transcode jobs to (PHP-based) microservice
Closed, DuplicatePublic

Description

MediaWiki job queue is a poor fit for long-running transcode processes, and is notoriously fragile (failing to update the status table with many failure cases). Experiment with refactoring the transcode jobs to be able to run from a (PHP-based) 'microservice' that can more reliably be checked in on.

Event Timeline

brion created this task.Sep 22 2015, 4:42 AM
brion claimed this task.
brion raised the priority of this task from to Needs Triage.
brion updated the task description. (Show Details)
brion added a project: TimedMediaHandler.
brion added a subscriber: brion.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 22 2015, 4:42 AM
greg added a subscriber: greg.Sep 22 2015, 4:47 AM
ori set Security to None.Sep 23 2015, 8:09 PM
ori added a subscriber: Gilles.
aaron added a subscriber: aaron.Sep 23 2015, 8:15 PM

Is this just a problem with timeouts being to low in HHVM or do you want some kind of rpc thing where can know if the process that claimed a job is still running?

Are any of the failures internal problems in the Job subclass (low timeouts, moving large files around for no reason, jobs that should be split up, ect). If that's the case it might just be easier to rewrite those.

brion added a comment.Oct 1 2015, 8:52 PM

It'd be nice to have a more general RPC-status-check; the Job subclasses maintain that information in the 'transcode' database table -- eg if ffmpeg crashes, execution returns to the PHP Job subclass and it updates the table with error info -- but when the timeout hits on the HHVM level the cleanup code never has a chance to run. :(

For now it looks like we've got the timeouts sorted out so I'm putting this one on back burner.

TheDJ moved this task from To sort to Transcoding on the TimedMediaHandler board.Oct 21 2015, 6:59 PM