Steps to replicate the issue (include links if applicable):
Never get output:
$ toolforge jobs run --image bookworm --no-filelog --command "bash -c 'for x in {0..10}; do date; sleep 1; done'" test-get-logs; toolforge jobs show test-get-logs | grep Status; kubectl get pods | grep test-get-logs-; echo; toolforge jobs logs -f test-get-logs | Status: | Running for 0s | test-get-logs-wb99z 0/1 ContainerCreating 0 1s
Get output:
$ toolforge jobs run --image bookworm --no-filelog --command "bash -c 'for x in {0..10}; do date; sleep 1; done'" test-get-logs; **sleep 2**; toolforge jobs show test-get-logs | grep Status; kubectl get pods | grep test-get-logs-; echo; toolforge jobs logs -f test-get-logs | Status: | Running for 3s | test-get-logs-cs5fd 1/1 Running 0 3s 2025-08-03T18:29:10+00:00 [test-get-logs-cs5fd] [job] Sun Aug 3 06:29:10 PM UTC 2025 2025-08-03T18:29:11+00:00 [test-get-logs-cs5fd] [job] Sun Aug 3 06:29:11 PM UTC 2025 2025-08-03T18:29:12+00:00 [test-get-logs-cs5fd] [job] Sun Aug 3 06:29:12 PM UTC 2025 2025-08-03T18:29:13+00:00 [test-get-logs-cs5fd] [job] Sun Aug 3 06:29:13 PM UTC 2025 2025-08-03T18:29:14+00:00 [test-get-logs-cs5fd] [job] Sun Aug 3 06:29:14 PM UTC 2025 2025-08-03T18:29:15+00:00 [test-get-logs-cs5fd] [job] Sun Aug 3 06:29:15 PM UTC 2025 2025-08-03T18:29:16+00:00 [test-get-logs-cs5fd] [job] Sun Aug 3 06:29:16 PM UTC 2025 2025-08-03T18:29:17+00:00 [test-get-logs-cs5fd] [job] Sun Aug 3 06:29:17 PM UTC 2025 2025-08-03T18:29:18+00:00 [test-get-logs-cs5fd] [job] Sun Aug 3 06:29:18 PM UTC 2025 2025-08-03T18:29:19+00:00 [test-get-logs-cs5fd] [job] Sun Aug 3 06:29:19 PM UTC 2025 2025-08-03T18:29:20+00:00 [test-get-logs-cs5fd] [job] Sun Aug 3 06:29:20 PM UTC 2025
What happens?:
When logs (KubernetesSource aka --follow type) are requested before the container is started, there is never any output.
I assume that https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/blob/main/toolforge_weld/logs/kubernetes.py?ref_type=heads#L35 fails to connect to the container (due to it not being running) and thus feeds no data from the thread to the query call that is being returned from the API.
What should have happened instead?:
KubernetesSource should periodically check for the container status, starting new threads for any new containers and destroying threads for any missing containers.
This would also allow --follow to exit when the container exits.