Page MenuHomePhabricator

[jobs-api] logs internal datetime error
Open, MediumPublic

Description

I just found this:

aborrero@tools-bastion-12:~$ kubectl sudo -n jobs-api logs jobs-api-6db96fc79d-dpjx6 webservice
[..]
[2024-04-12 19:06:37 +0000] [11] [ERROR] Error handling request
Traceback (most recent call last):
  File "/opt/lib/poetry/tjf-9TtSrW0h-py3.11/lib/python3.11/site-packages/gunicorn/workers/sync.py", line 183, in handle_request
    for item in respiter:
  File "/opt/lib/poetry/tjf-9TtSrW0h-py3.11/lib/python3.11/site-packages/werkzeug/wsgi.py", line 256, in __next__
    return self._next()
           ^^^^^^^^^^^^
  File "/opt/lib/poetry/tjf-9TtSrW0h-py3.11/lib/python3.11/site-packages/werkzeug/wrappers/response.py", line 32, in _iter_encoded
    for item in iterable:
  File "/app/tjf/api/jobs.py", line 57, in _format_logs
    for entry in logs:
  File "/opt/lib/poetry/tjf-9TtSrW0h-py3.11/lib/python3.11/site-packages/toolforge_weld/logs/kubernetes.py", line 66, in query
    for entry in self._get_pod_logs(
  File "/opt/lib/poetry/tjf-9TtSrW0h-py3.11/lib/python3.11/site-packages/toolforge_weld/logs/kubernetes.py", line 45, in _get_pod_logs
    datetime=parse_date(datetime),
             ^^^^^^^^^^^^^^^^^^^^
  File "/opt/lib/poetry/tjf-9TtSrW0h-py3.11/lib/python3.11/site-packages/dateutil/parser/_parser.py", line 1368, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/lib/poetry/tjf-9TtSrW0h-py3.11/lib/python3.11/site-packages/dateutil/parser/_parser.py", line 646, in parse
    raise ParserError("String does not contain a date: %s", timestr)
dateutil.parser._parser.ParserError: String does not contain a date: 
[2024-04-13 15:21:59 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:11)
[2024-04-13 15:21:59 +0000] [11] [INFO] Worker exiting (pid: 11)
[2024-04-13 15:21:59 +0000] [1] [ERROR] Worker (pid:11) exited with code 1
[2024-04-13 15:21:59 +0000] [1] [ERROR] Worker (pid:11) exited with code 1.
[..]

Another worker is created soon after, so the service doesn't seem very impacted.

Event Timeline

dcaro renamed this task from toolforge jobs api logs internal datetime error to [jobs-api] logs internal datetime error.Apr 15 2024, 11:27 AM
dcaro triaged this task as Medium priority.
dcaro moved this task from Backlog to Ready to be worked on on the Toolforge board.

I made a small patch to deal with this. I would push it to. see the attached patch.

i'd make a pull request but i don't have rights. i

i'd make a pull request but i don't have rights. i

The GitLab model requires you to "fork" the repository first.

Anyway, at least I'm curious what's causing these failures. For example if there are empty lines in the output for some reason, I'd prefer to just filter those lines out, instead of trying to parse them but ignoring all the possible parsing failures out there.

I think it's being caused by programs that print in weird ways (e.g. using terminal escapes). I understand the desire to not just blindly ignore these exceptions this but this issue has been causing issues for me for over a year. maybe we can just emit a dummy date and a warning about the parse error. i would've just forwarded it with None for the date but it seems downstream consumers aren't robust to it being None

This patch continues handling the logs (uses the current date), and shows a warning too to avoid them getting lost, though I have been unable to reproduce the issue locally using echo/printf with newlines and such :/
It does seem to be just empty lines (my guess would be from k8s/python itself when timing out the connection or similar).

@derenrich if you have a reproducer it would be quite helpful.

I think using the python library tqdm causes it.

Did a quick investigation, and I found that with the tqdm library as @derenrich says, the logs come in lines that are not preceded with the timestamp (this is from a silly script that uses toolforge_weld to get the raw lines):

datetime='2025-06-26T16:46:00.109184600Z' testlogs-654f6b5bf7-6cv6m lerele 
datetime='0.00loc' testlogs-654f6b5bf7-6cv6m lerele [00:00, ?loc/s]
datetime='29.1kloc' testlogs-654f6b5bf7-6cv6m lerele [00:00, 254kloc/s]
datetime='54.5kloc' testlogs-654f6b5bf7-6cv6m lerele [00:00, 118kloc/s]
datetime='69.8kloc' testlogs-654f6b5bf7-6cv6m lerele [00:00, 124kloc/s]
datetime='108kloc' testlogs-654f6b5bf7-6cv6m lerele [00:00, 193kloc/s] 
datetime='131kloc' testlogs-654f6b5bf7-6cv6m lerele [00:00, 163kloc/s]
datetime='151kloc' testlogs-654f6b5bf7-6cv6m lerele [00:00, 169kloc/s]
datetime='170kloc' testlogs-654f6b5bf7-6cv6m lerele [00:01, 168kloc/s]
datetime='193kloc' testlogs-654f6b5bf7-6cv6m lerele [00:01, 183kloc/s]
datetime='213kloc' testlogs-654f6b5bf7-6cv6m lerele [00:01, 127kloc/s]
datetime='229kloc' testlogs-654f6b5bf7-6cv6m lerele [00:01, 126kloc/s]
datetime='243kloc' testlogs-654f6b5bf7-6cv6m lerele [00:01, 129kloc/s]
datetime='259kloc' testlogs-654f6b5bf7-6cv6m lerele [00:01, 121kloc/s]
datetime='269kloc' testlogs-654f6b5bf7-6cv6m lerele [00:01, 145kloc/s]
local.tf-test@toolslocal:~$ cat test_logs.py 
import sys
from toolforge_weld.kubernetes import K8sClient, Kubeconfig
from toolforge_weld.logs.source import LogEntry, LogSource
from pathlib import Path


client =K8sClient(
            kubeconfig=Kubeconfig.from_path(path=Path(".kube/config")),
            user_agent="dcaro-tests",
        )


params = {
        "follow": True,
        "pretty": True,
        "timestamps": True,
}

import pdb; pdb.set_trace()
for line in client.get_raw_lines(
        "pods",
        name=sys.argv[1],
        subpath="/log",
        version=K8sClient.VERSIONS["pods"],
        params=params,
        timeout=None,
):
        datetime, message = line.split(" ", 1)
        print(
                f"datetime='{datetime}'",
                sys.argv[1],
                'lerele',
                message,
        )
local.tf-test@toolslocal:~$ cat test.sh 
#!/bin/bash


python3 -m venv /tmp/venv
source /tmp/venv/bin/activate
pip install tqdm
while true; do
find . -name '*.py' -type f -exec cat \{} \; |
            tqdm --unit loc --unit_scale >> /dev/null
sleep 1
    done
local.tf-test@toolslocal:~$ toolforge jobs run --continuous --command 'bash $TOOL_DATA_DIR/test.sh' --no-filelog --image python3.11 testlogs

The kubectl logs with timestamp show that too, one line with the timestamp, many without:

local.tf-test@toolslocal:~$ kubectl logs testlogs-654f6b5bf7-6cv6m --timestamps
2025-06-26T16:22:54.502882049Z Collecting tqdm
2025-06-26T16:22:54.505332556Z   Using cached tqdm-4.67.1-py3-none-any.whl (78 kB)
2025-06-26T16:22:54.674936388Z Installing collected packages: tqdm
2025-06-26T16:22:54.808860317Z Successfully installed tqdm-4.67.1
269kloc [00:02, 128kloc/s]  7Z 
269kloc [00:01, 141kloc/s] 47Z 
269kloc [00:02, 122kloc/s] 98Z 
269kloc [00:02, 130kloc/s]  6Z 
269kloc [00:01, 154kloc/s] 84Z 
269kloc [00:01, 152kloc/s] 61Z 
269kloc [00:01, 152kloc/s] 22Z 
269kloc [00:01, 154kloc/s] 71Z 
269kloc [00:01, 154kloc/s] 51Z 
269kloc [00:02, 129kloc/s]  2Z 
269kloc [00:02, 112kloc/s]  8Z 
269kloc [00:02, 108kloc/s]  1Z 
269kloc [00:02, 118kloc/s]  3Z