Right now, AssembleUploadChunksJob seems to be failing after about 4 minutes. I'm not 100% sure, but i think that is due to a timeout. The job then seems to restart despite the job having retries disabled.
Anyways, for large files, it is expected that the job might take a bit of time. Would we be able to increase the timeout of this job? At the very least to 15 minutes, but preferably to 1 hour just to be on the safe side.
PublishStashedFile is not expected to normally take as much time, but it wouldn't hurt to have that in the higher limit as well.
Looking around, it looks kind of like this can be set by editing deployment-charts/helmfile.d/services/changeprop-jobqueue/values.yaml - I was contemplating just proposing a patch, but I don't know much about how the job queue is implemented in modern times, and I'm not sure if there are additional considerations that need to be made beyond editing that file, so i thought I'd ask instead.
p.s. If you have access to additional logs, if you can grep the request id of 9b4b9d1e-6a32-4591-8f72-89af061843b6 for an example of such a job, and tell me if a timeout is really what is happening here, that'd be appreciated.