Change Details

We are in the process to move all of MediaWiki to kubernetes, and that includes all jobs, including `webVideoTranscodeJob`, which presents a series of challenges to us. Specifically: * Right now, these jobs are submitted to the jobrunners via HTTP by changeprop, thus **requiring http timeouts to be raised to 1 day** as some transcodes can be extremely long-running. This means that we can't restart the php-fpm daemons for every release we do, or video transcodes will never finish. * The job shells out to `ffmpeg` and another couple of softwares to transcode videos. ffmpeg will, by default, use as many threads as useful given the number of cpus on the host. That makes very **unpredictable the maximum amount of CPU resources a pod would use**, if we kept not defining `limits`. Defining limits, OTOH, will most likely result in heavy throttling * The job traditionally limits memory usage by the shellout using modification of cgroups, that won't be possible on kubernetes like it was on bare metal. While obviously kubernetes has facilities to limit the amount of memory a pod can use, that won't be as good as there is a chance the OOM killer kills the wrong process, thus killing the whole pod instead of the shellout. Each of these problems make videoscaling incompatible with our current setup on kubernetes. We have some ways out of all of the above problems, but none is particularly comfortable. ### Cpu usage limits Let's start with the easiest problem to solve: ffmpeg supports, at least in modern versions the `--threads` switch, which can limit the number of threads ffmpeg uses, so we have a way to put an upper bound on the amount of CPUs it uses with a slight modification of TimedMediaHandler's `WebVideoTranscodeJob::ffmpegEncode` and adding a configuration variable. ### Timeouts Needing to leave an http request running for up to one day has all sorts of problems, including the fact we can't deploy to k8s without killing the running pods. So we could either decide we will run a cronjob releasing code to the videoscalers once a day, thus making them running on stale code for potentially a long time, with all the consequences for security issues, for instance, which is undesirable. A potentially better alternative is to write a software (or modify changeprop to do it) that can take the jobs from kafka, then run them as kubernetes `Job` instances via the command line. This would need us to both write this software and a special maintenance script for mediawiki that can take a json job definition as input, which is quite easy to do. ### Memory limits This is actually more or less impossible to solve properly if the transcode is running locally. We clearly need to either convert TimedMediaHandler to use shellbox for execution of ffmpeg, or to find an alternative off-the-shelf system for transcoding videos that is designed for kubernetes and we can call from the job itself. Alternatively, we can try to tune the OOM killer to ensure the probability of killing the process using most memory in a cgroup is high enough that the number of "dirty kills" should be small enough. # Potential solutions Let's review the possible solutions we came up with during our internal kick-off meeting ## Solution 1: minimum effort * Create a mw-videoscaler deployment of mediawiki with a 1 day http timeout * Run a cron every day to update the code * Add code to run ffmpeg with a limited number of threads ## Solution 2: proper management * Create a mw-videoscaler namespace * write a smallish software that can read jobs from a kafka topic, then call the kubernetes api to spawn a `Job` to run a mediawiki maintenance script that can take a json job definition as input, with a preset concurrency * Use shellbox or an off-the shelf software for video transcoding that is k8s-native to actually perform the transcode ** This means adapting TimedMediaHandler's code in a much deeper way ** Also means we either need a new shellbox instance or a completely different software I would probably personally go with the latter, as it would noticeably improve how we run videoscaling, instead of making actively slightly worse than it is now. Depending on how easy it is to use Shellbox in TMH, I would actually go with that.

We are in the process to move all of MediaWiki to kubernetes, and that includes all jobs, including `webVideoTranscodeJob`, which presents a series of challenges to us. Specifically: * Right now, these jobs are submitted to the jobrunners via HTTP by changeprop, thus **requiring http timeouts to be raised to 1 day** as some transcodes can be extremely long-running. This means that we can't restart the php-fpm daemons for every release we do, or video transcodes will never finish. * The job shells out to `ffmpeg` and another couple of softwares to transcode videos. ffmpeg will, by default, use as many threads as useful given the number of cpus on the host. That makes very **unpredictable the maximum amount of CPU resources a pod would use**, if we kept not defining `limits`. Defining limits, OTOH, will most likely result in heavy throttling * The job traditionally limits memory usage by the shellout using modification of cgroups, that won't be possible on kubernetes like it was on bare metal. While obviously kubernetes has facilities to limit the amount of cpu a pod can use,memory a pod can use, that won't be as good as there is a chance the OOM killer kills the wrong process, thus killing the whole pod instead of the shellout. Each of these problems make videoscaling incompatible with our current setup on kubernetes. We have some ways out of all of the above problems, but none is particularly comfortable. ### Cpu usage limits Let's start with the easiest problem to solve: ffmpeg supports, at least in modern versions the `--threads` switch, which can limit the number of threads ffmpeg uses, so we have a way to put an upper bound on the amount of CPUs it uses with a slight modification of TimedMediaHandler's `WebVideoTranscodeJob::ffmpegEncode` and adding a configuration variable. ### Timeouts Needing to leave an http request running for up to one day has all sorts of problems, including the fact we can't deploy to k8s without killing the running pods. So we could either decide we will run a cronjob releasing code to the videoscalers once a day, thus making them running on stale code for potentially a long time, with all the consequences for security issues, for instance, which is undesirable. A potentially better alternative is to write a software (or modify changeprop to do it) that can take the jobs from kafka, then run them as kubernetes `Job` instances via the command line. This would need us to both write this software and a special maintenance script for mediawiki that can take a json job definition as input, which is quite easy to do. ### Memory limits This is actually more or less impossible to solve properly if the transcode is running locally. We clearly need to either convert TimedMediaHandler to use shellbox for execution of ffmpeg, or to find an alternative off-the-shelf system for transcoding videos that is designed for kubernetes and we can call from the job itself. Alternatively, we can try to tune the OOM killer to ensure the probability of killing the process using most memory in a cgroup is high enough that the number of "dirty kills" should be small enough. # Potential solutions Let's review the possible solutions we came up with during our internal kick-off meeting ## Solution 1: minimum effort * Create a mw-videoscaler deployment of mediawiki with a 1 day http timeout * Run a cron every day to update the code * Add code to run ffmpeg with a limited number of threads ## Solution 2: proper management * Create a mw-videoscaler namespace * write a smallish software that can read jobs from a kafka topic, then call the kubernetes api to spawn a `Job` to run a mediawiki maintenance script that can take a json job definition as input, with a preset concurrency * Use shellbox or an off-the shelf software for video transcoding that is k8s-native to actually perform the transcode ** This means adapting TimedMediaHandler's code in a much deeper way ** Also means we either need a new shellbox instance or a completely different software I would probably personally go with the latter, as it would noticeably improve how we run videoscaling, instead of making actively slightly worse than it is now. that won't be good as k8s would OOM kill the whole podDepending on how easy it is to use Shellbox in TMH, including the php code that needs to do the cleanup in case of a transcoding failureI would actually go with that.