Page MenuHomePhabricator

`toolforge jobs logs` returns nothing if started too early.
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

Never get output:

$ toolforge jobs run --image bookworm --no-filelog --command "bash -c 'for x in {0..10}; do date; sleep 1; done'" test-get-logs; toolforge jobs show test-get-logs | grep Status; kubectl get pods | grep test-get-logs-; echo; toolforge jobs logs -f test-get-logs
| Status:       | Running for 0s                                                  |
test-get-logs-wb99z                  0/1     ContainerCreating   0          1s

Get output:

$ toolforge jobs run --image bookworm --no-filelog --command "bash -c 'for x in {0..10}; do date; sleep 1; done'" test-get-logs; **sleep 2**; toolforge jobs show test-get-logs | grep Status; kubectl get pods | grep test-get-logs-; echo; toolforge jobs logs -f test-get-logs
| Status:       | Running for 3s                                                  |
test-get-logs-cs5fd                  1/1     Running   0          3s

2025-08-03T18:29:10+00:00 [test-get-logs-cs5fd] [job] Sun Aug  3 06:29:10 PM UTC 2025
2025-08-03T18:29:11+00:00 [test-get-logs-cs5fd] [job] Sun Aug  3 06:29:11 PM UTC 2025
2025-08-03T18:29:12+00:00 [test-get-logs-cs5fd] [job] Sun Aug  3 06:29:12 PM UTC 2025
2025-08-03T18:29:13+00:00 [test-get-logs-cs5fd] [job] Sun Aug  3 06:29:13 PM UTC 2025
2025-08-03T18:29:14+00:00 [test-get-logs-cs5fd] [job] Sun Aug  3 06:29:14 PM UTC 2025
2025-08-03T18:29:15+00:00 [test-get-logs-cs5fd] [job] Sun Aug  3 06:29:15 PM UTC 2025
2025-08-03T18:29:16+00:00 [test-get-logs-cs5fd] [job] Sun Aug  3 06:29:16 PM UTC 2025
2025-08-03T18:29:17+00:00 [test-get-logs-cs5fd] [job] Sun Aug  3 06:29:17 PM UTC 2025
2025-08-03T18:29:18+00:00 [test-get-logs-cs5fd] [job] Sun Aug  3 06:29:18 PM UTC 2025
2025-08-03T18:29:19+00:00 [test-get-logs-cs5fd] [job] Sun Aug  3 06:29:19 PM UTC 2025
2025-08-03T18:29:20+00:00 [test-get-logs-cs5fd] [job] Sun Aug  3 06:29:20 PM UTC 2025

What happens?:

When logs (KubernetesSource aka --follow type) are requested before the container is started, there is never any output.

I assume that https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/blob/main/toolforge_weld/logs/kubernetes.py?ref_type=heads#L35 fails to connect to the container (due to it not being running) and thus feeds no data from the thread to the query call that is being returned from the API.

What should have happened instead?:

KubernetesSource should periodically check for the container status, starting new threads for any new containers and destroying threads for any missing containers.

This would also allow --follow to exit when the container exits.

Event Timeline

Just fyi. the logic behind the follow option will be changed relatively soon (from k8s to directly using loki), so the implementation details might be quite different.

How critical is this feature for you? (if it's very critical, we might want to prioritize implementing it before the loki support for streaming).

I have a workaround in place (checking the container is running via the kubernetes api) for now, so can wait until the loki change lands.

T400913 would also be fine for my use case.

I'll leave this open as a record of the current behaviour but I'm happy with it not being fixed