Page MenuHomePhabricator

[logs-api] distinguish between container log entries and anything else
Open, MediumPublicBUG REPORT

Description

When using --follow an artificial log entry is emitted every ~15 seconds

tools.cluebot3@tools-bastion-15:~$ date; toolforge jobs logs -f cluebot3
Thu Nov 13 15:28:44 UTC 2025
2025-11-13T15:29:00.278526Z [nopod] [nocontainer] No logs received yet for job 'cluebot3', maybe the tool is using filelog or the job name is not correct? Will continue waiting just in case
2025-11-13T15:29:15.283920Z [nopod] [nocontainer] No logs received yet for job 'cluebot3', maybe the tool is using filelog or the job name is not correct? Will continue waiting just in case
2025-11-13T15:29:30.285722Z [nopod] [nocontainer] No logs received yet for job 'cluebot3', maybe the tool is using filelog or the job name is not correct? Will continue waiting just in case

This makes the contents inconsistent to the 'fetch' mode

tools.cluebot3@tools-bastion-15:~$ toolforge jobs logs cluebot3
ERROR: Job 'cluebot3' does not have any logs available

I am currently using the get_raw_lines method to fetch all logs, adding them to a list, until a pre-defined end marker is seen (working around previous issues with logs being dropped).

This requires a lot of calls to the logging endpoint and would be better served by the streaming endpoint, however the streaming endpoint pollutes the job output, which is persisted for the run (e.g. https://cluebotng-trainer.toolforge.org/Original%20Testing%20Training%20Set%20-%20Old%20Triplet/2025-08-30%2023:13:04/logs/bayes-train.log)

These can be filtered out by the pod and container fields being known strings, however those are are really 'internal' identifiers. Having an additional field to identify the contents is a 'response message' (to use the same name as what is used elsewhere) rather than a log entry would likely be better.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Agree, we can now try to extend the datastructure that logs-api returns (LogEntry), ideally we would want to support different types of logs too (build logs, system logs, etc.) so we might want to create some more generic one.

I think we might want to add some metadata + values, like:

{
    "metadata": {
        "type": "job",
        "datetime": "",
        ## We can add more as needed, like user, etc.
    }
    "data": {
        "container": ...,
        "pod": ...,
        "message": ...,
        ## this ones will depend on the type, builds don't have one single container for example
    }
}

That would allow also to have a 'type' that's something like "internal", and express there the fact that it got no logs yet.

Some notes for whomever implements this:

We probably want to make the changes in small increments on the cli (first support both formats, change the api to the new format, then remove support for the old format, for example).
Might be good also to notify users of the non-backwards compatible API changes (logs-api is quite new, so probably no need to give a lot of heads up).

Eventually having 'system logs' (build, jobs, components) with a common 'trace id', so for a deployment you can see everything that happened, would be very nice to have (probably would require some effort to make sure internal things that could have secrets don't get logged to places with tool access).

@DamianZaremba btw. I think that you are the last one using the jobs-api log endpoint, can you move your code to use the logs-api instead? (so we can remove the logs code from jobs api :) ).

@DamianZaremba btw. I think that you are the last one using the jobs-api log endpoint, can you move your code to use the logs-api instead? (so we can remove the logs code from jobs api :) ).

I'm in the process of doing that at sec :) there are a couple of small behaviour changes but it should be done today

@DamianZaremba btw. I think that you are the last one using the jobs-api log endpoint, can you move your code to use the logs-api instead? (so we can remove the logs code from jobs api :) ).

I'm in the process of doing that at sec :) there are a couple of small behaviour changes but it should be done today

Done in https://github.com/cluebotng/trainer/pull/349

dcaro triaged this task as Medium priority.Nov 19 2025, 8:36 AM
dcaro renamed this task from [logs-api] `--follow` returns inconsistent/artificial log entries to [logs-api] distinguish between container log entries and anything else.Feb 26 2026, 3:45 PM