According to my calculations, we have some 900 or so files for which after more than a month, we have still not finished generating the 1080p transcode. That seems like a backlog that cannot be caught up with, without adding additional capacity. This seems like we are chronically underprovisioned (and this is AFTER we got rid of the 720p transcodes, which I suspect were also in that class (this in turn might indicate that 1080p is often simply not finishing and crashing nodes)..
One of the problems with not having many transcode node for these, is that the transcodes for large files often get stuck, and then the node only gets restarted after like 3 hours of being stuck (not sure what the max time is exactly, but as transcodes can take like a day, I assume it's a bit conservative in shooting down a node), in the mean time not handling any other transcodes in that category.
SELECT COUNT(transcode_id) FROM transcode WHERE transcode_key = '1080p.vp9.webm' AND transcode_time_startwork IS NULL AND transcode_time_addjob IS NOT NULL AND transcode_time_success is null and transcode_time_error is null AND transcode_time_addjob > DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 30 DAY), '%Y%m%d%H%i%S')
903 entries
Over the same period, 1165 1080p entries succeeded and 62 failed.
There are multiple people on Commons who have noted that 1080ps are simply "not being generated" any longer. We need to tackle this, because otherwise we are just burning cpu while practically not even supporting 1080p.
What is strange is that we used to have 720p, 1080p, 1440p AND 2160p in this class. While 1440 and 2160p was a bit too much and were eventually removed (I think for good reasons), we were able to keep up with the rest pretty easily. This probably was when we were still on bare metal however.