Page MenuHomePhabricator

[toolforge,jobs] toolforge jobs logs read timeout error
Closed, ResolvedPublicBUG REPORT

Description

If you follow job logs with toolforge jobs logs -f, you get a read timeout after 30 seconds of inactivity. This seems overly short. Especially considering the fact that job logs are deleted about a minute after the job finishes -- it's not like I can come back later and see what happened.

$ date;toolforge jobs logs -l10 -f pano-2789e5c30ca6580e6a0e135d5f2e5f73 ; date
Fri 02 Feb 2024 01:08:39 PM UTC
...
ERROR: An internal error occured while executing this command.
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/contrib/pyopenssl.py", line 294, in recv_into
    return self.connection.recv_into(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/OpenSSL/SSL.py", line 1822, in recv_into
    self._raise_ssl_error(self._ssl, result)
  File "/usr/lib/python3/dist-packages/OpenSSL/SSL.py", line 1622, in _raise_ssl_error
    raise WantReadError()
OpenSSL.SSL.WantReadError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 360, in _error_catcher
    yield
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 666, in read_chunked
    self._update_chunk_length()
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 598, in _update_chunk_length
    line = self._fp.fp.readline()
  File "/usr/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3/dist-packages/urllib3/contrib/pyopenssl.py", line 307, in recv_into
    raise timeout('The read operation timed out')
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/models.py", line 750, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 490, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 694, in read_chunked
    self._original_response.close()
  File "/usr/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 365, in _error_catcher
    raise ReadTimeoutError(self._pool, None, 'Read timed out.')
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='api.svc.tools.eqiad1.wikimedia.cloud', port=30003): Read timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 797, in main
    run_subcommand(args=args, api=api)
  File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 723, in run_subcommand
    op_logs(api, args.name, args.follow, args.last)
  File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 528, in op_logs
    params=params,
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 169, in get_raw_lines
    yield from r.iter_lines(decode_unicode=True)
  File "/usr/lib/python3/dist-packages/requests/models.py", line 794, in iter_lines
    for chunk in self.iter_content(chunk_size=chunk_size, decode_unicode=decode_unicode):
  File "/usr/lib/python3/dist-packages/requests/utils.py", line 505, in stream_decode_response_unicode
    for chunk in iterator:
  File "/usr/lib/python3/dist-packages/requests/models.py", line 757, in generate
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.svc.tools.eqiad1.wikimedia.cloud', port=30003): Read timed out.
ERROR: Please report this issue to the Toolforge admins: https://w.wiki/6Zuu
Fri 02 Feb 2024 01:09:10 PM UTC

Event Timeline

The Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific project tag to this task. Thanks!

bd808 changed the subtype of this task from "Task" to "Bug Report".Feb 12 2024, 11:09 PM
dcaro triaged this task as Medium priority.Feb 21 2024, 4:43 PM
dcaro edited projects, added Toolforge; removed Toolforge Jobs framework.
dcaro moved this task from Backlog to Ready to be worked on on the Toolforge board.
dcaro renamed this task from toolforge jobs logs read timeout error to [toolforge,jobs] toolforge jobs logs read timeout error.Feb 21 2024, 4:51 PM
Raymond_Ndibe changed the task status from Open to In Progress.Mar 6 2025, 11:11 PM
Raymond_Ndibe claimed this task.

group_203_bot_4866fc124f4b41659f667468a6115cf3 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/705

jobs-api: bump to 0.0.358-20250311134533-f100751f