Feature summary (what you would like to be able to do and where):
Currently a job's logs can be queried
- While it is running (after starting x T401073)
$ toolforge jobs run --image bookworm --no-filelog --command "bash -c 'for _ in {0..5}; do date; sleep 1; done'" test-run; sleep 3; toolforge jobs logs test-run 2025-08-03T23:06:32+00:00 [test-run-sswbv] [job] Sun Aug 3 11:06:32 PM UTC 2025 2025-08-03T23:06:33+00:00 [test-run-sswbv] [job] Sun Aug 3 11:06:33 PM UTC 2025
- When it is failed
$ toolforge jobs run --image bookworm --no-filelog --command "bash -c 'for _ in {0..5}; do date; sleep 1; done; exit 1'" test-run; sleep 10; toolforge jobs logs test-run 2025-08-03T23:07:03+00:00 [test-run-6zzzz] [job] Sun Aug 3 11:07:03 PM UTC 2025 2025-08-03T23:07:04+00:00 [test-run-6zzzz] [job] Sun Aug 3 11:07:04 PM UTC 2025 2025-08-03T23:07:05+00:00 [test-run-6zzzz] [job] Sun Aug 3 11:07:05 PM UTC 2025 2025-08-03T23:07:06+00:00 [test-run-6zzzz] [job] Sun Aug 3 11:07:06 PM UTC 2025 2025-08-03T23:07:07+00:00 [test-run-6zzzz] [job] Sun Aug 3 11:07:07 PM UTC 2025 2025-08-03T23:07:08+00:00 [test-run-6zzzz] [job] Sun Aug 3 11:07:08 PM UTC 2025
However it is not possible to query a completed (once the kubernetes Job disappears) jobs logs:
$ toolforge jobs run --image bookworm --no-filelog --command "bash -c 'for _ in {0..5}; do date; sleep 1; done'" test-run; sleep 60; toolforge jobs logs test-run ERROR: Job 'test-run' does not exist
Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):
When executing with --mount=none or --filelog=none which is the ideal operating mode (no NFS dependency) the only way to get logs is either kubectl logs or toolforge jobs logs; the former is complicated when using toolforge jobs run as the pod name is not identical to the job name.
The latter does not work once the Job object goes away, however the logs are still present in Loki (https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Logging).
Currently the only way to query Loki is via the API gateway /tool/<name>/logs endpoint, which pre-filters <name> existing in Kubernetes.
For one-off jobs this is not very useful, compared to --filelog which allows historical viewing of logs.
Benefits (why should this be implemented?):
Allowing viewing historical logs (one off jobs) removes the need to use file logging on NFS.
Not requiring NFS is of benefit to the user (lower node overhead) and admins (reduced NFS load).