Page MenuHomePhabricator

[bug] after some time notebooks cannot be saved
Closed, ResolvedPublic

Description

As of recently, (python) notebooks that have been running for some time do not get saved automatically, and cannot be saved manually. If "Saved as..." only the old version is saved. I can copy all cells in the notebook, open up a new notebook and paste the cells. The new notebook gets automatically saved, at least for some time, but then it also gets stuck.
I use PAWS in Firefox on Linux and keep the session open for days.

Event Timeline

Does restarting the server do anything
File -> Hub Control Panel -> Stop My Server
?

@rook No, restarting the server does not help. Only when I close the tab and reopen the notebook I'm able to save it again, but by closing the tab I also lose all the changes (Save does nothing, I can only close it by Discard-ing).

I've lost hours of work the first time this happened.

Feels like your firefox is keeping jupyterlab open after it has a disconnect (Very possibly only a disconnect from the nfs), without telling you that it isn't connected. I'm seeing people raise issues of longer running jobs:
https://github.com/jupyter/notebook/issues/641
https://github.com/jupyter/notebook/issues/1647
But haven't found too much about jupyter experiencing a disconnect without telling the user about it. It might be worth opening an upstream issue to, more loudly, announce when any kind of disconnect is had...Actually it might be the nfs is disconnected, jupyter is still connected, the pod is still running, but it doesn't realize that it lacks an nfs backing at that point. Or the file handle went stale? You can save new files when you're in the state that you can't save an existing file, it's just files that have been open in your browser for awhile that don't save?

@rook My session had some 4-5 tabs open and only the one that was running the code (few hours or so) stopped saving the changes. I really don't know how it all works, but it's different from my experience some months ago. Back then even if there was a disconnect, my notebooks would respawn, and the kernel would keep all variables alive. Now nothing really tells me my session got disconnected, I mean, all notebooks execute the code, even the one that cannot be saved.
I'll try to find a way to reproduce this in a minimal notebook; will keep you posted.

Feels like your firefox is keeping jupyterlab open after it has a disconnect (Very possibly only a disconnect from the nfs), without telling you that it isn't connected.

I think there is something a bit more esoteric going on. If the the pod wasn't running or didn't have access to the filesystem creating a new file would not be possible. Might be browser session, pod v jupyterhub authentication or websocket connection failing. I would look into the hub and proxy logs for any error around the routes to the user pod and in the user pod itself for any error message the next time this happens.

Back then even if there was a disconnect, my notebooks would respawn, and the kernel would keep all variables alive.

Pods are kept for 24hs after any websocket connection is seen. Connections back after that period are not expected to keep state. The file still should have been saved, but I see it still is the frontend's responsibility to save it. So maybe connection from the browser to the pod was broken in an irrecoverable way? It'd be interesting to see requests from the browser when these issues happen.

@Ponor thank you for your email. Was looking at the logs for your pod, looks like the pod started about 9 days ago. The file that you mentioned shows up, though mostly just for trust:

[W 2022-06-29 14:50:46.659 SingleUserLabApp manager:669] Notebook HR naselja/Popisi stanovništva.ipynb is not trusted
[I 2022-06-29 14:52:58.484 SingleUserLabApp handlers:184] Saving file at /HR naselja/Popisi stanovništva.ipynb
[W 2022-06-29 14:52:58.488 SingleUserLabApp manager:653] Notebook HR naselja/Popisi stanovništva.ipynb is not trusted
...
[I 2022-06-29 21:35:36.241 SingleUserLabApp handlers:184] Saving file at /HR naselja/Popisi stanovništva.ipynb
[W 2022-06-29 21:35:36.244 SingleUserLabApp manager:653] Notebook HR naselja/Popisi stanovništva.ipynb is not trusted
[I 2022-06-29 21:35:36.315 SingleUserLabApp log:189] 200 PUT /user/PonoRoboT/api/contents/HR%20naselja/Popisi%20stanovni%C5%A1tva.ipynb?1656538535871 (PonoRoboT@192.168.252.128) 87.13ms
[I 2022-06-29 21:35:36.376 SingleUserLabApp log:189] 200 GET /user/PonoRoboT/api/contents/HR%20naselja/Popisi%20stanovni%C5%A1tva.ipynb/checkpoints?1656538536341 (PonoRoboT@192.168.252.128) 4.08ms

So probably not too interesting unless trust works on saving as well as running.

Here's the log starting a little before the first mention of stanovništva.ipynb

And here is the log from the beginning after trimming "GET /user/PonoRoboT/api/" lines from it to make it short enough to upload

There are some tracebacks that come up:

[W 2022-06-23 02:09:12.751 SingleUserLabApp zmqhandlers:227] WebSocket ping timeout after 119904 ms.
[I 2022-06-23 02:09:17.753 SingleUserLabApp kernelmanager:318] Starting buffering for eb546739-b649-4198-b781-6811ca6f2840:25f4dd79-6800-4cf2-8e75-01828a64be78
[E 2022-06-23 05:04:39.057 SingleUserLabApp mixins:578] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 576, in notify
        await client.fetch(req)
      File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
        future.result()
    tornado.httpclient.HTTPClientError: HTTP 403: Forbidden
[E 2022-06-23 05:04:42.762 SingleUserLabApp mixins:578] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 576, in notify
        await client.fetch(req)
      File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
        future.result()
    tornado.httpclient.HTTPClientError: HTTP 403: Forbidden
[E 2022-06-23 05:04:53.060 SingleUserLabApp mixins:578] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 576, in notify
        await client.fetch(req)
      File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
        future.result()
    tornado.httpclient.HTTPClientError: HTTP 403: Forbidden
[E 2022-06-23 05:04:57.158 SingleUserLabApp mixins:578] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 576, in notify
        await client.fetch(req)
      File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
        future.result()
    tornado.httpclient.HTTPClientError: HTTP 403: Forbidden
[E 2022-06-23 05:04:58.358 SingleUserLabApp mixins:578] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 576, in notify
        await client.fetch(req)
      File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
        future.result()
    tornado.httpclient.HTTPClientError: HTTP 403: Forbidden
[E 2022-06-23 05:05:05.565 SingleUserLabApp mixins:578] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 576, in notify
        await client.fetch(req)
      File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
        future.result()
    tornado.httpclient.HTTPClientError: HTTP 403: Forbidden
[E 2022-06-23 05:05:19.765 SingleUserLabApp mixins:578] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 576, in notify
        await client.fetch(req)
      File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
        future.result()
    tornado.httpclient.HTTPClientError: HTTP 403: Forbidden
[E 2022-06-23 05:05:27.563 SingleUserLabApp mixins:578] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 576, in notify
        await client.fetch(req)
      File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
        future.result()
    tornado.httpclient.HTTPClientError: HTTP 403: Forbidden
[E 2022-06-23 05:05:27.563 SingleUserLabApp mixins:603] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 601, in keep_activity_updated
        await self.notify_activity()
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 583, in notify_activity
        await exponential_backoff(
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/utils.py", line 189, in exponential_backoff
        raise asyncio.TimeoutError(fail_message)
    asyncio.exceptions.TimeoutError: Failed to notify Hub of activity
[E 2022-06-23 06:12:22.696 SingleUserLabApp mixins:578] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 576, in notify
        await client.fetch(req)
      File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
        future.result()
    ConnectionRefusedError: [Errno 111] Connection refused
[E 2022-06-23 06:12:24.455 SingleUserLabApp mixins:578] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 576, in notify
        await client.fetch(req)
      File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
        future.result()
    ConnectionRefusedError: [Errno 111] Connection refused
[E 2022-06-23 06:12:27.243 SingleUserLabApp mixins:578] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 576, in notify
        await client.fetch(req)
      File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
        future.result()
    ConnectionRefusedError: [Errno 111] Connection refused
[E 2022-06-23 06:12:30.567 SingleUserLabApp mixins:578] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 576, in notify
        await client.fetch(req)
      File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
        future.result()
    ConnectionRefusedError: [Errno 111] Connection refused
[E 2022-06-23 06:12:35.083 SingleUserLabApp mixins:578] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 576, in notify
        await client.fetch(req)
      File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
        future.result()
    ConnectionRefusedError: [Errno 111] Connection refused
[I 2022-06-23 06:12:46.240 SingleUserLabApp log:189] 302 GET /user/PonoRoboT/ -> /user/PonoRoboT/lab? (@192.168.79.36) 1.61ms
[E 2022-06-23 06:12:49.063 SingleUserLabApp mixins:578] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 576, in notify
        await client.fetch(req)
      File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
        future.result()
    ConnectionRefusedError: [Errno 111] Connection refused
[E 2022-06-23 06:13:24.067 SingleUserLabApp mixins:578] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 576, in notify
        await client.fetch(req)
      File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
        future.result()
    tornado.simple_httpclient.HTTPTimeoutError: Timeout during request
[E 2022-06-23 06:13:24.068 SingleUserLabApp mixins:603] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 601, in keep_activity_updated
        await self.notify_activity()
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 583, in notify_activity
        await exponential_backoff(
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/utils.py", line 189, in exponential_backoff
        raise asyncio.TimeoutError(fail_message)
    asyncio.exceptions.TimeoutError: Failed to notify Hub of activity
[E 2022-06-23 07:14:31.242 SingleUserLabApp mixins:578] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 576, in notify
        await client.fetch(req)
      File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
        future.result()
    tornado.simple_httpclient.HTTPTimeoutError: Timeout during request
[E 2022-06-23 07:25:20.491 SingleUserLabApp mixins:578] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 576, in notify
        await client.fetch(req)
      File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
        future.result()
    tornado.simple_httpclient.HTTPTimeoutError: Timeout during request
[E 2022-06-23 07:25:40.983 SingleUserLabApp mixins:578] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 576, in notify
        await client.fetch(req)
      File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
        future.result()
    tornado.simple_httpclient.HTTPTimeoutError: Timeout during request
[I 2022-06-23 08:54:08.967 SingleUserLabApp log:189] 101 GET /user/PonoRoboT/api/kernels/eb546739-b649-4198-b781-6811ca6f2840/channels?session_id=25f4dd79-6800-4cf2-8e75-01828a64be78 (PonoRoboT@172.16.5.11) 12024.04ms
[I 2022-06-23 08:54:08.968 SingleUserLabApp handlers:438] Restoring connection for eb546739-b649-4198-b781-6811ca6f2840:25f4dd79-6800-4cf2-8e75-01828a64be78
[IPKernelApp] ERROR | No such comm target registered: jupyter.widget.control
--
[I 2022-06-23 10:42:52.347 SingleUserLabApp log:189] 101 GET /user/PonoRoboT/api/yjs/notebook:redirect%20pages%20incorrectly%20named.ipynb (PonoRoboT@192.168.252.128) 1.37ms
[I 2022-06-23 10:43:47.860 SingleUserLabApp log:189] 101 GET /user/PonoRoboT/api/yjs/notebook:redirect%20pages%20incorrectly%20named.ipynb (PonoRoboT@172.16.5.11) 21328.80ms
[E 2022-06-23 10:43:47.862 SingleUserLabApp mixins:578] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/mixins.py", line 576, in notify
        await client.fetch(req)
      File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
        future.result()
    tornado.simple_httpclient.HTTPTimeoutError: Timeout during request
[I 2022-06-23 10:43:47.864 SingleUserLabApp log:189] 101 GET /user/PonoRoboT/api/yjs/notebook:redirect%20pages%20incorrectly%20named.ipynb (PonoRoboT@172.16.2.200) 1.21ms
[E 2022-06-23 10:43:47.965 SingleUserLabApp ioloop:761] Exception in callback functools.partial(<bound method YjsEchoWebSocket.hook_send_message of <jupyterlab.handlers.yjs_echo_ws.YjsEchoWebSocket object at 0x7fbb65005030>>, b'\x00\x00\x08\x01\xab\xc6\x89\xb8\x05\xdbD')
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/tornado/ioloop.py", line 741, in _run_callback
        ret = callback()
      File "/srv/paws/lib/python3.10/site-packages/jupyterlab/handlers/yjs_echo_ws.py", line 140, in hook_send_message
        self.write_message(msg, binary=True)
      File "/srv/paws/lib/python3.10/site-packages/tornado/websocket.py", line 337, in write_message
        raise WebSocketClosedError()
    tornado.websocket.WebSocketClosedError
[E 2022-06-23 10:43:47.974 SingleUserLabApp ioloop:761] Exception in callback functools.partial(<bound method YjsEchoWebSocket.hook_send_message of <jupyterlab.handlers.yjs_echo_ws.YjsEchoWebSocket object at 0x7fbb65005030>>, b'\x01@\x01\xab\xc6\x89\xb8\x05\xe5\x067{"user":{"name":"Anonymous Megaclite","color":"#null"}}')
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/tornado/ioloop.py", line 741, in _run_callback
        ret = callback()
      File "/srv/paws/lib/python3.10/site-packages/jupyterlab/handlers/yjs_echo_ws.py", line 140, in hook_send_message
        self.write_message(msg, binary=True)
      File "/srv/paws/lib/python3.10/site-packages/tornado/websocket.py", line 337, in write_message
        raise WebSocketClosedError()
    tornado.websocket.WebSocketClosedError
[I 2022-06-23 10:44:21.529 SingleUserLabApp log:189] 101 GET /user/PonoRoboT/api/yjs/notebook:redirect%20pages%20incorrectly%20named.ipynb (PonoRoboT@172.16.2.200) 1.42ms

And an error/warn pair that shows up:

[IPKernelApp] ERROR | No such comm target registered: jupyter.widget.control
[IPKernelApp] WARNING | No such comm: 7e5eec57-dfff-46f1-988b-a00342cbaf2d
--
[IPKernelApp] ERROR | No such comm target registered: jupyter.widget.control
[IPKernelApp] WARNING | No such comm: 56f02957-00e4-49a5-8590-eec2f69b3f2e
--
[IPKernelApp] ERROR | No such comm target registered: jupyter.widget.control
[IPKernelApp] WARNING | No such comm: 21a87a2d-0c12-4e82-a9f5-6166c80dc254

Here's the hub logs:

[I 2022-06-23 06:12:54.557 JupyterHub app:2186] PonoRoboT still running
[E 2022-06-23 06:17:26.359 JupyterHub <string>:30] Error checking for Wikimedia lock on user PonoRoboT: HTTP 599: OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 0
[W 2022-06-23 06:17:26.361 JupyterHub base:332] User PonoRoboT has stale auth info. Login is required to refresh.
[W 2022-06-23 06:17:26.361 JupyterHub log:189] 403 POST /hub/api/users/PonoRoboT/activity (@192.168.213.144) 256696.68ms
[E 2022-06-23 10:05:51.965 JupyterHub spawner:2585] Pods: {'prod/jupyter--41doroni-42ot': {'metadata': {'name': 'jupyter--41doroni-42ot', 'namespace': 'prod', 'uid': 'f1559205-80f5-4654-90fe-b28debece045', 'resourceVersion': '442669329', 'creationTimestamp': '2022-06-23T08:10:43Z', 'labels': {'app': 'jupyterhub', 'char
[I 2022-06-27 16:47:40.719 JupyterHub pages:148] User is running: PonoRoboT
[I 2022-06-27 16:47:40.721 JupyterHub log:189] 302 GET /hub/spawn -> /user/PonoRoboT/ (PonoRoboT@172.16.5.11) 8.63ms
[I 2022-06-28 13:55:59.476 JupyterHub log:189] 302 GET /hub/user/PonoRoboT/lab/tree/HR%20naselja/HR%20naselja.ipynb -> /user/PonoRoboT/lab/tree/HR%20naselja/HR%20naselja.ipynb?redirects=1 (PonoRoboT@192.168.252.128) 57.61ms
[I 2022-06-30 05:55:24.463 JupyterHub pages:148] User is running: PonoRoboT
[I 2022-06-30 05:55:24.465 JupyterHub log:189] 302 GET /hub/spawn -> /user/PonoRoboT/ (PonoRoboT@172.16.2.200) 17.12ms

proxy:

12:50:52.737 [ConfigProxy] info: Adding route /user/PonoRoboT -> http://192.168.213.170:8888
12:50:52.738 [ConfigProxy] info: Route added /user/PonoRoboT -> http://192.168.213.170:8888
12:50:52.738 [ConfigProxy] info: 201 POST /api/routes/user/PonoRoboT 
05:58:24.180 [ConfigProxy] info: Adding route /user/Ponor -> http://192.168.141.199:8888
05:58:24.181 [ConfigProxy] info: Route added /user/Ponor -> http://192.168.141.199:8888
05:58:24.182 [ConfigProxy] info: 201 POST /api/routes/user/Ponor 
03:03:44.178 [ConfigProxy] info: Removing route /user/PonoRoboT
03:03:44.179 [ConfigProxy] info: 204 DELETE /api/routes/user/PonoRoboT 
03:04:09.298 [ConfigProxy] info: Adding route /user/PonoRoboT -> http://192.168.213.189:8888
03:04:09.299 [ConfigProxy] info: Route added /user/PonoRoboT -> http://192.168.213.189:8888
03:04:09.300 [ConfigProxy] info: 201 POST /api/routes/user/PonoRoboT 
03:21:19.084 [ConfigProxy] info: Removing route /user/PonoRoboT
03:21:19.085 [ConfigProxy] info: 204 DELETE /api/routes/user/PonoRoboT 
11:41:07.380 [ConfigProxy] info: Adding route /user/PonoRoboT -> http://192.168.79.12:8888
11:41:07.381 [ConfigProxy] info: Route added /user/PonoRoboT -> http://192.168.79.12:8888
11:41:07.381 [ConfigProxy] info: 201 POST /api/routes/user/PonoRoboT 
01:52:46.840 [ConfigProxy] info: Removing route /user/Ponor
01:52:46.841 [ConfigProxy] info: 204 DELETE /api/routes/user/Ponor 
15:19:39.003 [ConfigProxy] info: Removing route /user/PonoRoboT
15:19:39.003 [ConfigProxy] info: 204 DELETE /api/routes/user/PonoRoboT 
15:20:27.126 [ConfigProxy] info: Adding route /user/PonoRoboT -> http://192.168.20.150:8888
15:20:27.127 [ConfigProxy] info: Route added /user/PonoRoboT -> http://192.168.20.150:8888
15:20:27.127 [ConfigProxy] info: 201 POST /api/routes/user/PonoRoboT 
11:49:16.522 [ConfigProxy] info: Adding route /user/Ponor -> http://192.168.164.218:8888
11:49:16.523 [ConfigProxy] info: Route added /user/Ponor -> http://192.168.164.218:8888
11:49:16.524 [ConfigProxy] info: 201 POST /api/routes/user/Ponor 
19:46:45.133 [ConfigProxy] info: Removing route /user/Ponor
19:46:45.134 [ConfigProxy] info: 204 DELETE /api/routes/user/Ponor 
02:46:45.199 [ConfigProxy] info: Removing route /user/PonoRoboT
02:46:45.200 [ConfigProxy] info: 204 DELETE /api/routes/user/PonoRoboT 
15:20:36.560 [ConfigProxy] info: Adding route /user/PonoRoboT -> http://192.168.213.144:8888
15:20:36.561 [ConfigProxy] info: Route added /user/PonoRoboT -> http://192.168.213.144:8888
15:20:36.561 [ConfigProxy] info: 201 POST /api/routes/user/PonoRoboT

Some other errors in proxy, though seemingly only associated with other users, mostly connection timeout and refused kinds of things. Perhaps the traces are a clue?

No such comm target registered: jupyter.widget.control seems immaterial:
https://github.com/jupyter-widgets/ipywidgets/issues/3454

Reading the github issue I guess moving to notebook-based culling is the way to improve things, I can try to take a look at that.

My best guess guess is the pod culling script is trying to kill the user even though it shouldn't, a sign we could look for is if the errors correlate with runs of culling script.

To do experience the same problem: I occasionally can't save notebooks anymore after some time, although I still can run them. This started pretty much exactly at the time when PAWS switched to JupyterLab. (My setup: latest Firefox on Windows 10).

For those who think they have lost unsaved/unsavable work: there is a SQLITE database in the background that logs the code of all cells that have been run. This still works when the notebook itself cannot be saved anymore. I have experienced this problem so often that I have even written a notebook to recover lost code from the history database:

https://public-paws.wmcloud.org/User:MisterSynergy/paws%20recovery/recover_from_Jupyter_history.ipynb

It does not restore unsaved notebooks, but all code that has run is being printed so that lost work can be put together manually again.

The hub pod gives:

[I 2023-05-19 12:28:17.123 JupyterHub log:186] 302 GET /hub/api/oauth2/authorize?client_id=jupyterhub-user-PonoRoboT&redirect_uri=%2Fuser%2FPonoRoboT%2Foauth_callback&response_type=code&state=[secret] -> /hub/login?next=%2Fhub%2Fapi%2Foauth2%2Fauthorize%3Fclient_id%3Djupyterhub-user-PonoRoboT%26redirect_uri%3D%252Fuser%252FPonoRoboT%252Foauth_callback%26response_type%3Dcode%26state%3DeyJ1dWlkIjogIjhkZjY5MDhjODY3OTQ4ZmM4MmY4YmEyNjFlNDRkMTNmIiwgIm5leHRfdXJsIjogIi91c2VyL1Bvbm9Sb2JvVC9sYWIvdHJlZS93aWtpZGF0YSUyMGVkaXRpbmcuaXB5bmIifQ (@10.100.3.0) 2.09ms
[I 2023-05-19 12:28:17.238 JupyterHub log:186] 200 GET /hub/login?next=%2Fhub%2Fapi%2Foauth2%2Fauthorize%3Fclient_id%3Djupyterhub-user-PonoRoboT%26redirect_uri%3D%252Fuser%252FPonoRoboT%252Foauth_callback%26response_type%3Dcode%26state%3DeyJ1dWlkIjogIjhkZjY5MDhjODY3OTQ4ZmM4MmY4YmEyNjFlNDRkMTNmIiwgIm5leHRfdXJsIjogIi91c2VyL1Bvbm9Sb2JvVC9sYWIvdHJlZS93aWtpZGF0YSUyMGVkaXRpbmcuaXB5bmIifQ (@10.100.3.0) 5.03ms
[I 2023-05-19 12:28:36.084 JupyterHub log:186] 200 POST /hub/api/users/PonoRoboT/activity (PonoRoboT@10.100.3.70) 30.20ms
[I 2023-05-19 12:28:39.486 JupyterHub log:186] 302 GET /hub/oauth_login?next=%2Fhub%2Fapi%2Foauth2%2Fauthorize%3Fclient_id%3Djupyterhub-user-PonoRoboT%26redirect_uri%3D%252Fuser%252FPonoRoboT%252Foauth_callback%26response_type%3Dcode%26state%3DeyJ1dWlkIjogIjhkZjY5MDhjODY3OTQ4ZmM4MmY4YmEyNjFlNDRkMTNmIiwgIm5leHRfdXJsIjogIi91c2VyL1Bvbm9Sb2JvVC9sYWIvdHJlZS93aWtpZGF0YSUyMGVkaXRpbmcuaXB5bmIifQ -> https://meta.wikimedia.org/w/index.php?title=Special%3AOAuth%2Fauthenticate&oauth_token=[secret]&oauth_consumer_key=[secret] (@10.100.3.0) 88.81ms

Though it looks like it only happened a few minutes ago (right now is 12:32). So probably not any reconnects in the background that are breaking it. As the issue was reported as visible about twenty minutes ago.

Around the time that the file last saved we see a number of "101 GET /user/PonoRoboT/api/yjs/notebook" notes

[I 2023-05-19 00:54:50.765 SingleUserLabApp log:191] 200 GET /user/PonoRoboT/api/metrics/v1?1684457690678 (PonoRoboT@10.100.3.0) 35.60ms
[I 2023-05-19 00:54:55.870 SingleUserLabApp log:191] 200 GET /user/PonoRoboT/api/metrics/v1?1684457695791 (PonoRoboT@10.100.3.0) 28.20ms
[I 2023-05-19 00:54:56.256 SingleUserLabApp log:191] 200 GET /user/PonoRoboT/api/terminals?1684457696197 (PonoRoboT@10.100.3.0) 2.74ms
[I 2023-05-19 00:54:56.271 SingleUserLabApp log:191] 200 GET /user/PonoRoboT/api/kernels?1684457696214 (PonoRoboT@10.100.3.0) 3.31ms
[I 2023-05-19 00:54:56.276 SingleUserLabApp log:191] 200 GET /user/PonoRoboT/api/sessions?1684457696219 (PonoRoboT@10.100.3.0) 3.37ms
[I 2023-05-19 00:54:56.713 SingleUserLabApp log:191] 200 GET /user/PonoRoboT/api/contents/HR?content=1&1684457696632 (PonoRoboT@10.100.3.0) 26.79ms
[I 2023-05-19 00:54:58.270 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:HR/Py%20-%20Popisi%20stanovni%C5%A1tva%20-%20enwiki.ipynb (PonoRoboT@10.100.3.0) 3.19ms
[I 2023-05-19 00:55:28.441 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:wikidata%20editing.ipynb (PonoRoboT@10.100.3.0) 3.58ms
[I 2023-05-19 00:55:28.634 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:mwparserfromhell.ipynb (PonoRoboT@10.100.3.0) 2.53ms
[I 2023-05-19 00:55:32.475 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:HR/Py%20-%20Popisi%20stanovni%C5%A1tva%20-%20enwiki.ipynb (PonoRoboT@10.100.3.0) 2.63ms
[I 2023-05-19 00:56:09.634 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:HR/Py%20-%20Popisi%20stanovni%C5%A1tva%20-%20enwiki.ipynb (PonoRoboT@10.100.3.0) 2.77ms
[I 2023-05-19 00:56:10.624 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:mwparserfromhell.ipynb (PonoRoboT@10.100.3.0) 2.66ms
[I 2023-05-19 00:56:15.646 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:wikidata%20editing.ipynb (PonoRoboT@10.100.3.0) 2.98ms
[I 2023-05-19 00:56:44.769 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:HR/Py%20-%20Popisi%20stanovni%C5%A1tva%20-%20enwiki.ipynb (PonoRoboT@10.100.3.0) 2.69ms
[I 2023-05-19 00:56:53.828 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:mwparserfromhell.ipynb (PonoRoboT@10.100.3.0) 2.78ms
[I 2023-05-19 00:57:01.858 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:wikidata%20editing.ipynb (PonoRoboT@10.100.3.0) 3.04ms
[I 2023-05-19 00:57:19.938 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:HR/Py%20-%20Popisi%20stanovni%C5%A1tva%20-%20enwiki.ipynb (PonoRoboT@10.100.3.0) 2.67ms
[I 2023-05-19 00:57:36.017 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:mwparserfromhell.ipynb (PonoRoboT@10.100.3.0) 3.60ms
[I 2023-05-19 00:57:46.028 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:wikidata%20editing.ipynb (PonoRoboT@10.100.3.0) 2.87ms
[I 2023-05-19 00:58:00.105 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:HR/Py%20-%20Popisi%20stanovni%C5%A1tva%20-%20enwiki.ipynb (PonoRoboT@10.100.3.0) 2.79ms
[I 2023-05-19 00:58:14.181 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:mwparserfromhell.ipynb (PonoRoboT@10.100.3.0) 3.17ms
[I 2023-05-19 00:58:27.215 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:wikidata%20editing.ipynb (PonoRoboT@10.100.3.0) 2.45ms
[I 2023-05-19 00:58:38.262 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:HR/Py%20-%20Popisi%20stanovni%C5%A1tva%20-%20enwiki.ipynb (PonoRoboT@10.100.3.0) 2.68ms
[I 2023-05-19 00:58:54.332 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:mwparserfromhell.ipynb (PonoRoboT@10.100.3.0) 2.95ms
[I 2023-05-19 00:59:07.394 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:wikidata%20editing.ipynb (PonoRoboT@10.100.3.0) 3.08ms
[I 2023-05-19 00:59:17.439 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:HR/Py%20-%20Popisi%20stanovni%C5%A1tva%20-%20enwiki.ipynb (PonoRoboT@10.100.3.0) 2.67ms
[I 2023-05-19 00:59:36.522 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:mwparserfromhell.ipynb (PonoRoboT@10.100.3.0) 2.63ms
[I 2023-05-19 00:59:58.679 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:HR/Py%20-%20Popisi%20stanovni%C5%A1tva%20-%20enwiki.ipynb (PonoRoboT@10.100.3.0) 66.57ms
[I 2023-05-19 00:59:58.881 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:wikidata%20editing.ipynb (PonoRoboT@10.100.3.0) 3.01ms
[I 2023-05-19 01:00:26.754 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:mwparserfromhell.ipynb (PonoRoboT@10.100.3.0) 2.71ms
[I 2023-05-19 01:00:39.797 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:HR/Py%20-%20Popisi%20stanovni%C5%A1tva%20-%20enwiki.ipynb (PonoRoboT@10.100.3.0) 2.95ms
[I 2023-05-19 01:00:53.859 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:wikidata%20editing.ipynb (PonoRoboT@10.100.3.0) 1.48ms
[I 2023-05-19 01:01:17.983 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:HR/Py%20-%20Popisi%20stanovni%C5%A1tva%20-%20enwiki.ipynb (PonoRoboT@10.100.3.0) 2.31ms
[I 2023-05-19 01:01:18.973 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:mwparserfromhell.ipynb (PonoRoboT@10.100.3.0) 2.57ms
[I 2023-05-19 01:01:27.706 SingleUserLabApp log:191] 200 GET /user/PonoRoboT/api/metrics/v1?1684458087449 (PonoRoboT@10.100.3.0) 29.62ms
[I 2023-05-19 01:01:32.059 SingleUserLabApp log:191] 101 GET /user/PonoRoboT/api/yjs/notebook:wikidata%20editing.ipynb (PonoRoboT@10.100.3.0) 2.54ms
[I 2023-05-19 01:01:32.827 SingleUserLabApp log:191] 200 GET /user/PonoRoboT/api/metrics/v1?1684458092734 (PonoRoboT@10.100.3.0) 35.04ms
[I 2023-05-19 01:01:32.899 SingleUserLabApp log:191] 200 GET /user/PonoRoboT/api/terminals?1684458092818 (PonoRoboT@10.100.3.0) 2.96ms
[I 2023-05-19 01:01:32.903 SingleUserLabApp log:191] 200 GET /user/PonoRoboT/api/kernels?1684458092821 (PonoRoboT@10.100.3.0) 3.50ms
[I 2023-05-19 01:01:32.959 SingleUserLabApp log:191] 200 GET /user/PonoRoboT/api/contents/HR?content=1&1684458092829 (PonoRoboT@10.100.3.0) 54.15ms
[I 2023-05-19 01:01:32.961 SingleUserLabApp log:191] 200 GET /user/PonoRoboT/api/sessions?1684458092824 (PonoRoboT@10.100.3.0) 54.19ms
[I 2023-05-19 01:01:38.679 SingleUserLabApp log:191] 200 GET /user/PonoRoboT/api/metrics/v1?1684458097855 (PonoRoboT@10.100.3.0) 35.03ms
[I 2023-05-19 01:01:42.986 SingleUserLabApp log:191] 200 GET /user/PonoRoboT/api/terminals?1684458102929 (PonoRoboT@10.100.3.0) 2.87ms

There's a somewhat mysterious gap in the logs if we filter out "SingleUserLabApp log:191]"

[W 2023-05-19 00:50:21.962 SingleUserLabApp manager:685] Notebook HR/Py - Popisi stanovništva - enwiki.ipynb is not trusted
[I 2023-05-19 00:52:22.829 SingleUserLabApp handlers:184] Saving file at /HR/Py - Popisi stanovništva - enwiki.ipynb
[W 2023-05-19 00:52:22.835 SingleUserLabApp manager:685] Notebook HR/Py - Popisi stanovništva - enwiki.ipynb is not trusted
[W 2023-05-19 02:35:34.223 SingleUserLabApp client:125] Could not destroy zmq context for <jupyter_client.asynchronous.client.AsyncKernelClient object at 0x7f40d0171a50>
[W 2023-05-19 02:35:38.645 SingleUserLabApp client:125] Could not destroy zmq context for <jupyter_client.asynchronous.client.AsyncKernelClient object at 0x7f40d0173fa0>
[I 2023-05-19 02:36:42.679 SingleUserLabApp handlers:172] Creating new notebook in /HR

The file was last saved on:
-rw-r--r--. 1 tools.paws tools.paws 813464 May 19 00:52 'HR/Py - Popisi stanovništva - enwiki.ipynb'

When kernels are shut down we see:

[E 2023-05-19 13:53:34.662 SingleUserLabApp log:191] 500 PATCH /user/PonoRoboT/api/contents/HR/Py%20-%20Popisi%20stanovni%C5%A1tva%20-%20enwiki%20-%20saveas.ipynb?1684504414363 (PonoRoboT@10.100.3.0) 5.07ms
[E 2023-05-19 13:53:34.665 SingleUserLabApp web:1871] Uncaught exception PATCH /user/PonoRoboT/api/contents/HR/Py%20-%20Popisi%20stanovni%C5%A1tva%20-%20enwiki%20-%20saveas.ipynb?1684504414364 (10.100.3.0)
    HTTPServerRequest(protocol='http', host='hub-paws.wmcloud.org', method='PATCH', uri='/user/PonoRoboT/api/contents/HR/Py%20-%20Popisi%20stanovni%C5%A1tva%20-%20enwiki%20-%20saveas.ipynb?1684504414364', version='HTTP/1.1', remote_ip='10.100.3.0')
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/tornado/web.py", line 1786, in _execute
        result = await result
      File "/srv/paws/lib/python3.10/site-packages/jupyter_server/services/contents/handlers.py", line 145, in patch
        model = await ensure_async(cm.update(model, path))
      File "/srv/paws/lib/python3.10/site-packages/jupyter_server/services/contents/manager.py", line 466, in update
        self.rename(path, new_path)
      File "/srv/paws/lib/python3.10/site-packages/jupyter_server/services/contents/manager.py", line 454, in rename
        self.rename_file(old_path, new_path)
      File "/srv/paws/lib/python3.10/site-packages/jupyter_server/services/contents/filemanager.py", line 578, in rename_file
        if os.path.exists(new_os_path) and not samefile(old_os_path, new_os_path):
      File "/usr/lib/python3.10/genericpath.py", line 100, in samefile
        s1 = os.stat(f1)
    FileNotFoundError: [Errno 2] No such file or directory: '/home/paws/HR/Py - Popisi stanovništva - enwiki - saveas.ipynb'
[W 2023-05-19 13:53:34.666 SingleUserLabApp handlers:649] wrote error: 'Unhandled error'
[E 2023-05-19 13:53:34.668 SingleUserLabApp log:183] {
      "Cookie": "jupyterhub-user-PonoRoboT=[secret]; _xsrf=[secret]; jupyterhub-session-id=[secret]",
      "Cache-Control": "no-cache",
      "Pragma": "no-cache",
      "Sec-Fetch-Site": "same-origin",
      "Sec-Fetch-Mode": "cors",
      "Sec-Fetch-Dest": "empty",
      "Origin": "https://hub-paws.wmcloud.org",
      "X-Xsrftoken": "2|257068c7|8e81a29b16714f177e07d050cd426c85|1682878401",
      "Authorization": "token [secret]",
      "Content-Type": "text/plain;charset=UTF-8",
      "Referer": "http://hub-paws.wmcloud.org/hub/",
      "Accept-Encoding": "gzip, deflate, br",
      "Accept-Language": "en-US,en;q=0.5",
      "Accept": "*/*",
      "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/112.0",
      "Content-Length": "54",
      "X-Scheme": "http",
      "X-Forwarded-Scheme": "http",
      "X-Forwarded-Proto": "http,http",
      "X-Forwarded-Port": "80,80",
      "X-Forwarded-Host": "hub-paws.wmcloud.org",
      "X-Forwarded-For": "10.100.3.0,::ffff:10.100.2.3",
      "X-Real-Ip": "10.100.3.0",
      "X-Request-Id": "e9d7419e45cb392c13bc09cccdac8386",
      "Host": "hub-paws.wmcloud.org",
      "Connection": "close"
    }

And:

[I 2023-05-19 13:55:37.884 SingleUserLabApp log:191] 200 GET /user/PonoRoboT/api/metrics/v1?1684504537727 (PonoRoboT@10.100.3.0) 11.56ms
[I 2023-05-19 13:55:38.013 SingleUserLabApp log:191] 200 GET /user/PonoRoboT/api/terminals?1684504537963 (PonoRoboT@10.100.3.0) 2.67ms
[W 2023-05-19 13:55:43.256 SingleUserLabApp web:1869] 409 PATCH /user/PonoRoboT/api/contents/HR/Py%20-%20Popisi%20stanovni%C5%A1tva%20-%20enwiki%20-%20saveas.ipynb?1684504542989 (10.100.3.0): File already exists: HR/Py - Popisi stanovništva - enwiki.ipynb
[W 2023-05-19 13:55:43.257 SingleUserLabApp handlers:649] wrote error: 'File already exists: HR/Py - Popisi stanovništva - enwiki.ipynb'
[W 2023-05-19 13:55:43.258 SingleUserLabApp log:191] 409 PATCH /user/PonoRoboT/api/contents/HR/Py%20-%20Popisi%20stanovni%C5%A1tva%20-%20enwiki%20-%20saveas.ipynb?1684504542989 (PonoRoboT@10.100.3.0) 5.19ms
[W 2023-05-19 13:55:43.262 SingleUserLabApp web:1869] 409 PATCH /user/PonoRoboT/api/contents/HR/Py%20-%20Popisi%20stanovni%C5%A1tva%20-%20enwiki%20-%20saveas.ipynb?1684504542988 (10.100.3.0): File already exists: HR/Py - Popisi stanovništva - enwiki.ipynb
[W 2023-05-19 13:55:43.262 SingleUserLabApp handlers:649] wrote error: 'File already exists: HR/Py - Popisi stanovništva - enwiki.ipynb'
[W 2023-05-19 13:55:43.263 SingleUserLabApp log:191] 409 PATCH /user/PonoRoboT/api/contents/HR/Py%20-%20Popisi%20stanovni%C5%A1tva%20-%20enwiki%20-%20saveas.ipynb?1684504542988 (PonoRoboT@10.100.3.0) 3.99ms

A lot was updated in T343116. Can this issue still be observed?

@rook I haven't observed it since the latest upgrade to 4.0
(reporting that EVERYTHING is finally working fine!)

@rook I haven't observed it since the latest upgrade to 4.0
(reporting that EVERYTHING is finally working fine!)

Super rad! I'll close this ticket up, thanks for the report.