When kubernetes for toolforge was announced, I was quick to change my tools over from grid, even though that meant sometimes massive reorganisation and, in a few cases, rewrites; some tools worked well with the fire-and-forget grid engine approach, but conflict with the kubernetes toolforge approach.
I rewrote some massive codebases in Rust, mostly in order to perform continuous jobs in an efficient way, using async and munti-threading. But even these jobs suffer from the limitations imposed by toolforge.
My latest example is my Mix'n'match tool. I rewrote large parts of the PHP code in Rust, in order to run both on-demand and scheduled tasks on the MnM database. These jobs can vary from a few seconds to a few days. Ideally, I would start a kubernetes job with several CPUs, to run threads efficiently, and a few GB of RAM. Sadly, there are massive restrictions on both CPUs and RAM.
My workaraound (and when your users have to create workarounds to be able to use your infrastructure, you know you have an issue) is to start two copies of the same job under slightly different names. This is obviously more inefficient thatn running a single larger job. Furthermore, these jobs are prone to spantaneously stop (an issue I have noticed with other tools using kuberentes) and are sometimes just left as "fails to start", even when marked as continuous. On top of that, if I start two jobs (1CPU/1GB RAM each), the webservice stops as well, and needs to be restarted manually. Again, this does not just affect this tool but several.
I believe this will not be solved by changing configuration for one tool or the other; there needs to be a re-think of how to offer kubernetes to us toolforge users, in a way that does not make us want to ditch it in favour of screen.