Page MenuHomePhabricator

[bug] Server does not start
Closed, ResolvedPublic

Description

What happened?

Here I registered today. I'm logging with a Wikimedia account Klip_game. Every time the server does not start and I have an inscription "Spawn failed. The latest attempt to start your server has failed. Would you like to retry starting it?" I tried logging in through different computers with different operating systems (Win7 and Win10). But the result was always negative.

What should have happened?

The description says that the server will be started.

Event Timeline

Could you try pressing the logout button on the page that says "Spawn failed" and try again? I've seen it get stuck before with a strange login state that the logout button sometimes clears.

[I 2023-02-10 16:48:05.716 JupyterHub base:810] User logged in: Klip_game
[I 2023-02-10 16:48:05.727 JupyterHub log:186] 302 GET /hub/oauth_callback?oauth_verifier=[secret]&oauth_token=[secret] -> /hub/spawn 
(@10.100.1.0) 215.87ms
[I 2023-02-10 16:48:05.922 JupyterHub provider:651] Creating oauth client jupyterhub-user-Klip_game
[I 2023-02-10 16:48:05.949 JupyterHub log:186] 302 GET /hub/spawn -> /hub/spawn-pending/Klip_game (Klip_game@10.100.3.1) 77.21ms
[E 2023-02-10 16:48:05.953 JupyterHub user:832] Unhandled error starting Klip_game's server: [Errno 13] Permission denied: '/data/project/paws/userhomes/4530'
    Traceback (most recent call last):
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/user.py", line 725, in spawn
        await maybe_future(authenticator.pre_spawn_start(self, spawner))
      File "<string>", line 51, in pre_spawn_start
      File "/usr/local/lib/python3.9/os.py", line 225, in makedirs
        mkdir(name, mode)
    PermissionError: [Errno 13] Permission denied: '/data/project/paws/userhomes/4530'
    
[E 2023-02-10 16:48:06.007 JupyterHub gen:630] Exception in Future <Task finished name='Task-734' coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /usr/local/lib/python3.9/site-packages/jupyterhub/handlers/base.py:954> exception=PermissionError(13, 'Permission denied')> after timeout
    Traceback (most recent call last):
      File "/usr/local/lib/python3.9/site-packages/tornado/gen.py", line 625, in error_callback
        future.result()
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/handlers/base.py", line 961, in finish_user_spawn
        await spawn_future
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/user.py", line 850, in spawn
        raise e
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/user.py", line 725, in spawn
        await maybe_future(authenticator.pre_spawn_start(self, spawner))
      File "<string>", line 51, in pre_spawn_start
      File "/usr/local/lib/python3.9/os.py", line 225, in makedirs
        mkdir(name, mode)
    PermissionError: [Errno 13] Permission denied: '/data/project/paws/userhomes/4530'
    
[E 2023-02-10 16:48:06.093 JupyterHub pages:371] Previous spawn for Klip_game failed: [Errno 13] Permission denied: '/data/project/paws/userhomes/4530'
[E 2023-02-10 16:48:06.100 JupyterHub log:178] {
      "Referer": "http://hub.paws.wmcloud.org/hub/",
      "Cookie": "jupyterhub-hub-login=[secret]; jupyterhub-session-id=[secret]",
      "Accept-Language": "ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7",
      "Accept-Encoding": "gzip, deflate, br",
      "Sec-Ch-Ua-Platform": "\"Windows\"",
      "Sec-Ch-Ua-Mobile": "?0",
      "Sec-Ch-Ua": "\"Opera GX\";v=\"93\", \"Not/A)Brand\";v=\"8\", \"Chromium\";v=\"107\"",
      "Sec-Fetch-Dest": "document",
      "Sec-Fetch-User": "?1",
      "Sec-Fetch-Mode": "navigate",
      "Sec-Fetch-Site": "cross-site",
      "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
      "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36 OPR/93.0.0.0",
      "Upgrade-Insecure-Requests": "1",
      "Cache-Control": "max-age=0",
      "X-Scheme": "http",
      "X-Forwarded-Scheme": "http",
      "X-Forwarded-Proto": "http,http",
      "X-Forwarded-Port": "80,80",
      "X-Forwarded-Host": "hub.paws.wmcloud.org",
      "X-Forwarded-For": "10.100.2.0,::ffff:10.100.3.21",
      "X-Real-Ip": "10.100.2.0",
      "X-Request-Id": "3d675ae2e9f3552d0581cefd15b272f1",
      "Host": "hub.paws.wmcloud.org",
      "Connection": "close"
    }
[E 2023-02-10 16:48:06.100 JupyterHub log:186] 500 GET /hub/spawn-pending/Klip_game (Klip_game@10.100.2.0) 13.04ms

Interesting you're getting a permission denied on your home directory

Looks like the hub pod doesn't have write permission on the nfs server (labstore1004.eqiad.wmnet)

tools.paws@hub-68f7cccdcf-5c5s4:/data/project/paws/userhomes$ mkdir testdir
mkdir: cannot create directory ‘testdir’: Permission denied

Which I believe would be causing the issue from https://github.com/toolforge/paws/blob/2ac67cb244059954a9a02d71466361a096e99b2b/paws/values.yaml#L176

Directory on NFS had different user

root@labstore1004:/srv/misc/shared/paws/project/paws# ls -l
total 136
drwsr-sr-x 5242 www-data www-data 135168 Feb 10 18:26 userhomes
root@labstore1004:/srv/misc/shared/paws/project/paws# chown 52771:52771 userhomes/

Seemed to work for my test user now. Please give it another try and let us know. Thanks!

Wait on that. Looks like it managed to change back.

root@labstore1004:/srv/misc/shared/paws/project/paws# ls -l
total 136
drwsr-sr-x 5243 www-data www-data 135168 Feb 10 19:03 userhomes

I tried to connect from different computers and got the error again

Yeah something is switching the permissions back to www-data

@Klip_game I've created your home directory manually. I believe that you should be able to log in now, please let me know either way. Still I'm going to leave this ticket open as the issue is still ongoing and would impact anyone else who doesn't already have a home directory.

Everything started working today.

Many thanks for the help.

@rook Hey, I've been having the same issue and I would really appreciate it if you could create a home directory for me if you have any spare time. Thanks!

Change 888311 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] P:wmcs::services::toolsdb_replica_cnf: don't manage the directories

https://gerrit.wikimedia.org/r/888311

taavi subscribed.

This is caused by T303663.

Feb  9 14:57:03 labstore1004 puppet-agent[43476]: (/Stage[main]/Main/File[/srv/tools/shared/tools/project]/owner) owner changed 'root' to 'www-data'
Feb  9 14:57:03 labstore1004 puppet-agent[43476]: (/Stage[main]/Main/File[/srv/tools/shared/tools/project]/group) group changed 'root' to 'www-data'
Feb  9 14:57:03 labstore1004 puppet-agent[43476]: (/Stage[main]/Main/File[/srv/misc/shared/paws/project/paws/userhomes]/owner) owner changed 52771 to 'www-data'
Feb  9 14:57:03 labstore1004 puppet-agent[43476]: (/Stage[main]/Main/File[/srv/misc/shared/paws/project/paws/userhomes]/group) group changed 52771 to 'www-data'
Feb  9 14:57:03 labstore1004 puppet-agent[43476]: (/Stage[main]/Main/File[/srv/tools/shared/tools/home]/owner) owner changed 'root' to 'www-data'
Feb  9 14:57:03 labstore1004 puppet-agent[43476]: (/Stage[main]/Main/File[/srv/tools/shared/tools/home]/group) group changed 'root' to 'www-data'

Feb 10 19:04:07 labstore1004 puppet-agent[28891]: (/Stage[main]/Main/File[/srv/misc/shared/paws/project/paws/userhomes]/owner) owner changed 52771 to 'www-data' (corrective)
Feb 10 19:04:07 labstore1004 puppet-agent[28891]: (/Stage[main]/Main/File[/srv/misc/shared/paws/project/paws/userhomes]/group) group changed 52771 to 'www-data' (corrective)

I have temporarily fixed this by disabling Puppet and running the following:

root@labstore1004:~# chown root:root /srv/tools/shared/tools/project
root@labstore1004:~# chown root:root /srv/tools/shared/tools/home
root@labstore1004:~# chown 52771:52771 /srv/misc/shared/paws/project/paws/userhomes

The patch I attached is a more permanent fix.

Hi @EchidnaLives , I think @taavi got a temp fix in, with a more permanent one in review. Could you try logging back in to see if it is working for you now? If not I would be glad to create a directory for you manually.

Change 888311 merged by David Caro:

[operations/puppet@production] P:wmcs::services::toolsdb_replica_cnf: don't manage the directories

https://gerrit.wikimedia.org/r/888311

Mentioned in SAL (#wikimedia-cloud) [2023-02-13T13:32:43Z] <taavi> re-enable puppet on labstore1004 T329377

@rook Yes, it works for me (sorry for my belated reply)! Thanks for you and taavi's work on this! :)

I still get "Spawn failed" error

Hi @Shizhao could you try pressing the "logout" button and try again?

在T329377#8618110中,@rook写道:

Hi @Shizhao could you try pressing the "logout" button and try again?

Thx!it's ok now.