Page MenuHomePhabricator

Connection with `k8s.tools.eqiad1.wikimedia.cloud` hits SSL error
Closed, InvalidPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):
This happens very frequently when I wanted to work with webservice in toolforge.

  • become campwiz
  • toolforge webservice stop

Also my previous encounter toolforge webservice buildservice start --mount all >> P90317
What happens?:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 464, in _make_request
    self._validate_conn(conn)
    ~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 1093, in _validate_conn
    conn.connect()
    ~~~~~~~~~~~~^^
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 741, in connect
    sock_and_verified = _ssl_wrap_socket_and_match_hostname(
        sock=sock,
    ...<14 lines>...
        assert_fingerprint=self.assert_fingerprint,
    )
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 920, in _ssl_wrap_socket_and_match_hostname
    ssl_sock = ssl_wrap_socket(
        sock=sock,
    ...<8 lines>...
        tls_in_tls=tls_in_tls,
    )
  File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 460, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
  File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 504, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
           ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/ssl.py", line 455, in wrap_socket
    return self.sslsocket_class._create(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        sock=sock,
        ^^^^^^^^^^
    ...<5 lines>...
        session=session
        ^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/lib/python3.13/ssl.py", line 1076, in _create
    self.do_handshake()
    ~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3.13/ssl.py", line 1372, in do_handshake
    self._sslobj.do_handshake()
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^
ssl.SSLEOFError: [SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1029)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 787, in urlopen
    response = self._make_request(
        conn,
    ...<10 lines>...
        **response_kw,
    )
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 488, in _make_request
    raise new_e
urllib3.exceptions.SSLError: [SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1029)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
        method=request.method,
    ...<9 lines>...
        chunked=chunked,
    )
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 841, in urlopen
    retries = retries.increment(
        method, url, error=new_e, _pool=self, _stacktrace=sys.exc_info()[2]
    )
  File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 519, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='k8s.tools.eqiad1.wikimedia.cloud', port=6443): Max retries exceeded with url: /api/v1/namespaces/tf-public/configmaps/image-config (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1029)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/toolforge-webservice", line 33, in <module>
    sys.exit(load_entry_point('toolforge-webservice==0.103.19', 'console_scripts', 'toolforge-webservice')())
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3/dist-packages/toolsws/cli/webservice.py", line 238, in main
    KubernetesBackend.get_types(),
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3/dist-packages/toolsws/backends/kubernetes.py", line 295, in get_types
    configmap = api.get_object(
        "configmaps", "image-config", namespace="tf-public"
    )
  File "/usr/lib/python3/dist-packages/toolforge_weld/kubernetes.py", line 192, in get_object
    return self.get(
           ~~~~~~~~^
        kind,
        ^^^^^
    ...<2 lines>...
        namespace=namespace,
        ^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 184, in get
    response = self._make_request("GET", url, **kwargs).json()
               ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 141, in _make_request
    raise e
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 135, in _make_request
    response = self.session.request(method, **self.make_kwargs(url, **kwargs))
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 698, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='k8s.tools.eqiad1.wikimedia.cloud', port=6443): Max retries exceeded with url: /api/v1/namespaces/tf-public/configmaps/image-config (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1029)')))

What should have happened instead?:
It should give proper response like stopped.

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Event Timeline

Also got

tools.campwiz-backend-beta@tools-bastion-15:~$ toolforge components config create toolforge.yaml 
Warning: You are using a beta feature of Toolforge.
ComponentsClientError: HTTPSConnectionPool(host='10.96.0.1', port=443): Max retries exceeded with url: /apis/components-api.toolforge.org/v1/namespaces/tool-campwiz-backend-beta/toolconfigs (Caused by NewConnectionError("HTTPSConnection(host='10.96.0.1', port=443): Failed to establish a new connection: [Errno 111] Connection refused"))
Please report this issue to the Toolforge admins if it persists: https://w.wiki/6Zuu

Now it gave me

Traceback (most recent call last):
  File "/usr/bin/toolforge-webservice", line 33, in <module>
    sys.exit(load_entry_point('toolforge-webservice==0.103.19', 'console_scripts', 'toolforge-webservice')())
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3/dist-packages/toolsws/cli/webservice.py", line 591, in main
    start(job, "Your job is not running, starting")
    ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/toolsws/cli/webservice.py", line 105, in start
    job.request_start()
    ~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3/dist-packages/toolsws/backends/kubernetes.py", line 683, in request_start
    self.api.create_object(
    ~~~~~~~~~~~~~~~~~~~~~~^
        "deployments", self._get_deployment(started_at)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/lib/python3/dist-packages/toolforge_weld/kubernetes.py", line 250, in create_object
    return self.post(
           ~~~~~~~~~^
        kind,
        ^^^^^
    ...<2 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 190, in post
    response = self._make_request("POST", url, **kwargs).json()
               ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 157, in _make_request
    raise e
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 136, in _make_request
    response.raise_for_status()
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3/dist-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 409 Client Error: Conflict for url: https://k8s.tools.eqiad1.wikimedia.cloud:6443/apis/apps/v1/namespaces/tool-campwiz-backend/deployments
tools.campwiz-backend@tools-bastion-15:~$ toolforge webservice buildservice restart --mount all
Traceback (most recent call last):
  File "/usr/bin/toolforge-webservice", line 33, in <module>
    sys.exit(load_entry_point('toolforge-webservice==0.103.19', 'console_scripts', 'toolforge-webservice')())
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3/dist-packages/toolsws/cli/webservice.py", line 591, in main
    start(job, "Your job is not running, starting")
    ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/toolsws/cli/webservice.py", line 105, in start
    job.request_start()
    ~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3/dist-packages/toolsws/backends/kubernetes.py", line 683, in request_start
    self.api.create_object(
    ~~~~~~~~~~~~~~~~~~~~~~^
        "deployments", self._get_deployment(started_at)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/lib/python3/dist-packages/toolforge_weld/kubernetes.py", line 250, in create_object
    return self.post(
           ~~~~~~~~~^
        kind,
        ^^^^^
    ...<2 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 190, in post
    response = self._make_request("POST", url, **kwargs).json()
               ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 157, in _make_request
    raise e
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 136, in _make_request
    response.raise_for_status()
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3/dist-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 409 Client Error: Conflict for url: https://k8s.tools.eqiad1.wikimedia.cloud:6443/apis/apps/v1/namespaces/tool-campwiz-backend/deployments

when I ran toolforge webservice buildservice restart --mount all

I have been trying to trigger this from my own tool, running toolforge components config create config.yaml in a loop for 1000 times, and was unable to get it to fail.

tools.wm-lol@tools-bastion-15:~$ echo "## Starting"; failed=0; passed=0; total=1000; for i in $(seq $total); do toolforge components config create config.yaml && passed=$((passed+1)) || failed=$((failed+1)); if [[ $(((i / 10) * 10)) -eq $i ]]; then echo "   [ran $i] failed=$failed, passed=$passed"; fi; done
...
   [ran 1000] failed=0, passed=1000

Also ran webservice restart in a loop, without issues:

tools.wm-lol@tools-bastion-15:~$ echo "## Starting"; failed=0; passed=0; total=1000; for i in $(seq $total); do toolforge webservice buildservice restart --mount=all && passed=$((passed+1)) || failed=$((failed+1)); if [[ $(((i / 10) * 10)) -eq $i ]]; then echo "   [ran $i] failed=$failed, passed=$passed"; fi; done
... restarting a bunch of times
   [ran 1000] failed=0, passed=1000

How often does it fail for you?
Are you doing anything other while running these commands?
Would you mind if I try for example running the config create you show for campwiz-backend-beta?

@Nokib_Sarkar have you seen this happen on multiple occasions, or just several times on the 7th specifically? (I want to make sure it's not a side-effect of maintenance activity.)

Andrew triaged this task as Medium priority.Apr 8 2026, 8:42 PM

hi, i have been seeing it for the past two days straight. I copied some useful debug information (today) which might be beneficial.

nokibsarkar@tools-bastion-15:~$ become campwiz-backend
tools.campwiz-backend@tools-bastion-15:~$ toolforge webservice buildservice start --mount all
Traceback (most recent call last):
  File "/usr/bin/toolforge-webservice", line 33, in <module>
    sys.exit(load_entry_point('toolforge-webservice==0.103.19', 'console_scripts', 'toolforge-webservice')())
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3/dist-packages/toolsws/cli/webservice.py", line 579, in main
    start(job, "Starting webservice")
    ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/toolsws/cli/webservice.py", line 105, in start
    job.request_start()
    ~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3/dist-packages/toolsws/backends/kubernetes.py", line 683, in request_start
    self.api.create_object(
    ~~~~~~~~~~~~~~~~~~~~~~^
        "deployments", self._get_deployment(started_at)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/lib/python3/dist-packages/toolforge_weld/kubernetes.py", line 250, in create_object
    return self.post(
           ~~~~~~~~~^
        kind,
        ^^^^^
    ...<2 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 190, in post
    response = self._make_request("POST", url, **kwargs).json()
               ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 157, in _make_request
    raise e
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 136, in _make_request
    response.raise_for_status()
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3/dist-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 409 Client Error: Conflict for url: https://k8s.tools.eqiad1.wikimedia.cloud:6443/apis/apps/v1/namespaces/tool-campwiz-backend/deployments
tools.campwiz-backend@tools-bastion-15:~$ toolforge webservice logs
No logs found!
tools.campwiz-backend@tools-bastion-15:~$ toolforge components build list
Usage: toolforge-components [OPTIONS] COMMAND [ARGS]...
Try 'toolforge-components --help' for help.

Error: No such command 'build'.
tools.campwiz-backend@tools-bastion-15:~$ toolforge components deployment list
ID                          Creation time    Status      Builds                                                                                                                   Runs
--------------------------  ---------------  ----------  -----------------------------------------------------------------------------------------------------------------------  -----------------------------------------------------------------------------------
20260408-033931-m5k0dqiq3k  20260408-033931  successful  campwiz-backend(successful): id:campwiz-backend-buildpacks-pipelinerun-945gw You can see the logs with `toolforge [...]  campwiz-backend(successful): created or updated job campwiz-backend, [...]
                                                         campwiz-backend-readonly(skipped): id:no-build-needed Component re-uses build from campwiz-backend                       campwiz-backend-readonly(successful): created or updated job campwiz-backend- [...]
                                                         campwiz-task-manager(skipped): id:no-build-needed Component re-uses build from campwiz-backend                           campwiz-task-manager(successful): created or updated job campwiz-task- [...]
20260408-034503-09lvao9enl  20260408-034503  successful  campwiz-backend(successful): id:campwiz-backend-buildpacks-pipelinerun-cqz58 You can see the logs with `toolforge [...]  campwiz-backend(successful): created or updated job campwiz-backend, [...]
                                                         campwiz-task-manager(skipped): id:no-build-needed Component re-uses build from campwiz-backend                           campwiz-task-manager(successful): created or updated job campwiz-task- [...]
20260408-035147-ngh0p4uup8  20260408-035147  successful  campwiz-backend(successful): id:campwiz-backend-buildpacks-pipelinerun-wgxgz You can see the logs with `toolforge [...]  campwiz-backend(successful): created or updated job campwiz-backend, [...]
                                                         campwiz-task-manager(skipped): id:no-build-needed Component re-uses build from campwiz-backend                           campwiz-task-manager(successful): created or updated job campwiz-task- [...]
20260408-035458-l2xjx5p03f  20260408-035458  successful  campwiz-backend(successful): id:campwiz-backend-buildpacks-pipelinerun-694c8 You can see the logs with `toolforge [...]  campwiz-backend(successful): created or updated job campwiz-backend, [...]
                                                         campwiz-task-manager(skipped): id:no-build-needed Component re-uses build from campwiz-backend                           campwiz-task-manager(successful): created or updated job campwiz-task- [...]
tools.campwiz-backend@tools-bastion-15:~$ toolforge jobs list
+----------------------+------------+---------+
|      Job name:       | Job type:  | Status: |
+----------------------+------------+---------+
|   campwiz-backend    | continuous | Running |
| campwiz-task-manager | continuous | Running |
+----------------------+------------+---------+

Hi, any updates? I cannot start the webservice even today for campwiz. is it possible to start by you on behalf of me from backend?

@Nokib_Sarkar Just a note on the latest traceback: the 409 Conflict on the /deployments endpoint usually means the deployment already exists in the namespace. Looking at your toolforge jobs list, it seems like both campwiz-backend and campwiz-task-manager are actually running, which is good news! You might just need restart instead of start to pick up any changes.

For the original SSL error, I was also unable to reproduce it from bastion-15. If it comes back, noting the exact timestamp may help the team correlate with server-side logs.

@HakanIST

tools.campwiz-backend@tools-bastion-15:~$ toolforge webservice restart
Could not find a public_html folder or a .lighttpd.conf file in your tool home.
tools.campwiz-backend@tools-bastion-15:~$ toolforge webservice buildservice start --mount all
Traceback (most recent call last):
  File "/usr/bin/toolforge-webservice", line 33, in <module>
    sys.exit(load_entry_point('toolforge-webservice==0.103.19', 'console_scripts', 'toolforge-webservice')())
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3/dist-packages/toolsws/cli/webservice.py", line 579, in main
    start(job, "Starting webservice")
    ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/toolsws/cli/webservice.py", line 105, in start
    job.request_start()
    ~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3/dist-packages/toolsws/backends/kubernetes.py", line 683, in request_start
    self.api.create_object(
    ~~~~~~~~~~~~~~~~~~~~~~^
        "deployments", self._get_deployment(started_at)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/lib/python3/dist-packages/toolforge_weld/kubernetes.py", line 250, in create_object
    return self.post(
           ~~~~~~~~~^
        kind,
        ^^^^^
    ...<2 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 190, in post
    response = self._make_request("POST", url, **kwargs).json()
               ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 157, in _make_request
    raise e
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 136, in _make_request
    response.raise_for_status()
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3/dist-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 409 Client Error: Conflict for url: https://k8s.tools.eqiad1.wikimedia.cloud:6443/apis/apps/v1/namespaces/tool-campwiz-backend/deployments
tools.campwiz-backend@tools-bastion-15:~$ date
Sat Apr 11 12:10:48 UTC 2026
tools.campwiz-backend@tools-bastion-15:~$

@Nokib_Sarkar try toolforge webservice buildservice restart --mount all (with the buildservice flag). The plain toolforge webservice restart looks for a legacy lighttpd setup, which is why it cannot find public_html. And since the deployment is already present, buildservice start returns 409, but buildservice restart should work.

So, the issue was I had a job named campwiz-backend using push-to-deploy. My tool name was also campwiz-backend. So, when I tried to start the webservice, it gave a conflict because (I think), it also wanted to create a job named campwiz-backend. Now, @taavi cleared that I cannot still use push-to-deploy for the webservice and about the issue. I need to manually build the image and use that. So, now everything is working. Thanks everyone.

Nokib_Sarkar assigned this task to taavi.
taavi changed the task status from Resolved to Invalid.Apr 11 2026, 5:10 PM
taavi removed taavi as the assignee of this task.

Filed T423005 for making the error message better, and marking this as invalid since nothing was actually changed about the infrastructure.