Page MenuHomePhabricator

`toolforge jobs logs` misplaces my logs
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Have a toolforge account with access to the link-dispenser tool
  • become link-dispenser
  • toolforge jobs logs crawljob -f

What happens?:

2026-03-31T15:33:55.897423Z [nopod] [nocontainer] No logs received yet for job 'crawljob', maybe the tool is using filelog or the job name is not correct? Will continue waiting just in case
2026-03-31T15:34:10.900148Z [nopod] [nocontainer] No logs received yet for job 'crawljob', maybe the tool is using filelog or the job name is not correct? Will continue waiting just in case
2026-03-31T15:34:25.903334Z [nopod] [nocontainer] No logs received yet for job 'crawljob', maybe the tool is using filelog or the job name is not correct? Will continue waiting just in case
2026-03-31T15:34:40.904994Z [nopod] [nocontainer] No logs received yet for job 'crawljob', maybe the tool is using filelog or the job name is not correct? Will continue waiting just in case
2026-03-31T15:34:55.907052Z [nopod] [nocontainer] No logs received yet for job 'crawljob', maybe the tool is using filelog or the job name is not correct? Will continue waiting just in case

What should have happened instead?:
I should be able to see the logs of my job, especially since I can see that my job is running.

+-----------+------------+---------+
| Job name: | Job type:  | Status: |
+-----------+------------+---------+
| crawljob  | continuous | Running |
|   redis   | continuous | Running |
+-----------+------------+---------+

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

If it helps, crawljob is a celery container that health-checks URLs, I can invoke it from the web UI of the tool and it appears that the job works fine.

tools.link-dispenser@tools-bastion-14:~$ toolforge jobs show crawljob
+---------------+-----------------------------------------------------------------+
| Job name:     | crawljob                                                        |
+---------------+-----------------------------------------------------------------+
| Command:      | crawljob                                                        |
+---------------+-----------------------------------------------------------------+
| Job type:     | continuous                                                      |
+---------------+-----------------------------------------------------------------+
| Image:        | tool-link-dispenser/tool-link-dispenser:latest                  |
+---------------+-----------------------------------------------------------------+
| Port:         | none                                                            |
+---------------+-----------------------------------------------------------------+
| File log:     | no                                                              |
+---------------+-----------------------------------------------------------------+
| Output log:   |                                                                 |
+---------------+-----------------------------------------------------------------+
| Error log:    |                                                                 |
+---------------+-----------------------------------------------------------------+
| Emails:       | all                                                             |
+---------------+-----------------------------------------------------------------+
| Resources:    | mem: 3.0Gi, cpu: 2.0                                            |
+---------------+-----------------------------------------------------------------+
| Replicas:     | 1                                                               |
+---------------+-----------------------------------------------------------------+
| Mounts:       | none                                                            |
+---------------+-----------------------------------------------------------------+
| Retry:        | no                                                              |
+---------------+-----------------------------------------------------------------+
| Timeout:      | no                                                              |
+---------------+-----------------------------------------------------------------+
| Health check: | script: launcher ping                                           |
+---------------+-----------------------------------------------------------------+
| Status:       | Running                                                         |
+---------------+-----------------------------------------------------------------+
| Hints:        | Last run at 2026-03-20T05:32:11Z. Pod in 'Running' phase. State |
|               | 'running'. Started at '2026-03-20T05:32:20Z'.                   |
+---------------+-----------------------------------------------------------------+
tools.link-dispenser@tools-bastion-14:~$ toolforge jobs logs crawljob
ERROR: Job 'crawljob' does not have any logs available
tools.link-dispenser@tools-bastion-14:~$ kubectl get po
NAME                              READY   STATUS    RESTARTS   AGE
crawljob-6cd85dbdfb-kqwq8         1/1     Running   0          11d
link-dispenser-86cf4d9c96-fqj6s   1/1     Running   0          11d
redis-55c68578f-v5k72             1/1     Running   0          12d
tools.link-dispenser@tools-bastion-14:~$ kubectl logs crawljob-6cd85dbdfb-kqwq8
/layers/heroku_python/dependencies/lib/python3.12/site-packages/celery/platforms.py:799: SecurityWarning: An entry for the specified gid or egid was not found.
We're assuming this is a potential security issue.
...
[2026-03-31 13:52:04,233: INFO/ForkPoolWorker-12] Task jobs.crawl_page[e903c0dc-0557-415e-9e36-63e3705b63bb] succeeded in 18.486338403075933s: None

There are logs going to stdout/stderr in the container and being seen by Kubernetes, but it looks like they are not being collected by Loki.

There are logs going to stdout/stderr in the container and being seen by Kubernetes, but it looks like they are not being collected by Loki.

Yes, either that or they're not matched by the Loki query that the logs command issues.

taavi triaged this task as High priority.Apr 1 2026, 1:51 PM

There are logs going to stdout/stderr in the container and being seen by Kubernetes, but it looks like they are not being collected by Loki.

Are this logs very old logs? or new logs? By default we trim the logs to 1h, you can fetch older logs with --since 1d or similar.

There are logs going to stdout/stderr in the container and being seen by Kubernetes, but it looks like they are not being collected by Loki.

Are this logs very old logs? or new logs? By default we trim the logs to 1h, you can fetch older logs with --since 1d or similar.

Oops, this is not yet deployed xd, I think it's about to be deployed.

The issue here is probably that we are getting by default only the last 1h of logs (with follow we were not limiting it before, now we are setting it to the same value as the regular).

We have to figure out how to tell loki to give the last N lines of logs, no matter how old they are, but we have not yet gotten there (that has been an issue for a while).

The issue here is probably that we are getting by default only the last 1h of logs (with follow we were not limiting it before, now we are setting it to the same value as the regular).

It looks like the state in T421929#11774133 had log output that was less than an hour old in kubectl logs crawljob-6cd85dbdfb-kqwq8. As I check right now the most recent log is very far out of a 1 hour retention window (2026-03-31 22:13:25,173).

The issue here is probably that we are getting by default only the last 1h of logs (with follow we were not limiting it before, now we are setting it to the same value as the regular).

It looks like the state in T421929#11774133 had log output that was less than an hour old in kubectl logs crawljob-6cd85dbdfb-kqwq8. As I check right now the most recent log is very far out of a 1 hour retention window (2026-03-31 22:13:25,173).

If that's the case, then the problem is a different one :), please share if you find an instance of that, according to the post time in phabricator I'm not certain that the command was run in the 1 hour window (could be though, I'm not certain it was not either), do you remember if you did it in that window?

@Soda You can now see all your logs using --since to adjust how far in the past the logs should be gotten from. By default this is 1hr, which explains why you couldn't see anything.

The reason you noticed this when you did toolforge jobs logs crawljob -f and not prior (probably) was that we deployed a patch that enforces the 1h start time for streamed logs (-f). This was already the case for non streamed logs, but this patch enabled it for streamed logs (as well as other things).

The patch in question is to support --since and --until params. We likely need to do a better job of explaining what the defaults are, since that's not immediately clear.

If that's the case, then the problem is a different one :), please share if you find an instance of that, according to the post time in phabricator I'm not certain that the command was run in the 1 hour window (could be though, I'm not certain it was not either), do you remember if you did it in that window?

I can't read. :) 13:52:04 is not within an hour of 16:25. I think my brain did some "fancy" math that went something like ":52 -> :25 means I need to add an hour, 13 + 1 is 14, 4pm... that's in the window". There was a 4 in there somewhere!

@Soda You can now see all your logs using --since to adjust how far in the past the logs should be gotten from. By default this is 1hr, which explains why you couldn't see anything.

The reason you noticed this when you did toolforge jobs logs crawljob -f and not prior (probably) was that we deployed a patch that enforces the 1h start time for streamed logs (-f). This was already the case for non streamed logs, but this patch enabled it for streamed logs (as well as other things).

The patch in question is to support --since and --until params. We likely need to do a better job of explaining what the defaults are, since that's not immediately clear.

Got it, will make a note to not use the -f from now on (until -f is fixed)

Got it, will make a note to not use the -f from now on (until -f is fixed)

This is already deployed, you should be able to use --since when using -f, for example toolforge jobs logs crawljob --since 1d -f, and see all the logs since the last day.

dcaro moved this task from In progress to Done on the tools-platform-team board.

I'll close this, but feel free to reopen if you still have issues.