Here's a discussion (from a decade ago) about how the cloud-init systemd timeout should be unlimited:
https://github.com/sdake/heat-jeos/issues/1
That seems to have been implemented, as in Bullseye the timeouts look like this:
root@util-abogott-bullseye:~# systemctl show cloud-init.service | grep Timeout TimeoutStartUSec=infinity TimeoutStopUSec=infinity TimeoutAbortUSec=infinity TimeoutStartFailureMode=terminate TimeoutStopFailureMode=terminate TimeoutCleanUSec=infinity JobTimeoutUSec=infinity JobRunningTimeoutUSec=infinity JobTimeoutAction=none
On recent Bookworm builds, however, we have:
root@k8s-dev-bastion:~# systemctl show cloud-init.service | grep Timeout TimeoutStartUSec=1min 30s TimeoutStopUSec=1min 30s TimeoutAbortUSec=1min 30s TimeoutStartFailureMode=terminate TimeoutStopFailureMode=terminate TimeoutCleanUSec=infinity JobTimeoutUSec=infinity JobRunningTimeoutUSec=infinity JobTimeoutAction=none
That's kind of a disaster. On a good day we can complete our initial puppet run in 90 seconds, but as time passes and latest package versions drift, things get slower and exceed that timeout.
We need to do some detective work and figure out why that timeout was added and see if we can have it switched back. Worst case we can maybe hack in our own timeout during base image build but that'll be messy.