Page MenuHomePhabricator

PAWS fails creating a server for new user
Closed, ResolvedPublic

Description

Tracking task for general 500 errors after a user attempts to start a server in PAWS.

General behaviour:

  1. Go to https://paws.wmflabs.org
  2. Authenticate with Mediawiki (oAuth works, PAWS should be registered app in Meta)
  3. Ask for new server, page tells you "Your server is starting up. You will be redirected automatically when it's ready for you."
  4. Page errors out with "500 : Internal Server Error Failed to reach your server. Please try again later. Contact admin if the issue persists. You can try restarting your server from the home page."

At least two distinct reasons exists for this same behaviour, if one of them is affecting you, comment bellow.

Event Timeline

Chicocvenancio renamed this task from Creating a server for new user fails to PAWS fails creating a server for new user .Jan 30 2018, 3:10 PM
Chicocvenancio raised the priority of this task from Medium to High.
Chicocvenancio added a subscriber: Chicocvenancio.

Using PAWS from my WMF account fails as described by @Herzi.Pinki. (my bot and volunteer accounts work fine)

@Herzi.Pinki I think I solved the problem for your bot account. I'll leave this open until you confirm it is working and I find a more sustainable solution for other users.
Apparently the userhome directory is being created by the root user, leaving the tools.paws user without write access.

@Chicocvenancio can you outline what you did to fix this in the singular case?

@chasemp found what the userhome folder for that user was by going to https://paws-public.wmflabs.org/paws-public/User:ISO_3166_Bot then chown tools.paws /data/project/paws/userhomes/52971983
As I noted in #wikimedia-cloud-admin there is a ****-hack.bash file in paws home that basically does that for each userhome, but its not currently running. I think we should investigate why the folders are being created with root as owner but start up the hack again in the meantime.

@Chicocvenancio: the main problem is always to find the right person. Not to fix the problem which was eventually just a chown with sufficient permissions. Thank you so much (it works now).

For convenience of my future fellow failers: Can you please

  • find the root cause why the owner was set incorrectly
  • link the ''contact admin'' phrase to someone that might provide help in such cases

Adding @madhuvishy to take a look at the script. Also, the logfiles for the script amount to 2GB at this point, maybe we should include them in logrotate somewhere.

The hack seems to be insufficient, maybe there is another step to it we're missing?
I just changed the group to all userhome folders to tools.paws after a new user complained again about 500 errors.

Mentioned in SAL (#wikimedia-cloud) [2018-02-16T20:18:45Z] <chicocvenancio> changed userhomes group for T185434 workarround

We discussed this a bit in IRC.

Apparently the culprit for having the userhome with incorrect owner is an bug that required the hub image to be run as root for the culler to work (see https://github.com/yuvipanda/paws/commit/596d51cec69e1c91b7811c1e9040d86e1d494d24 ).

I am not sure the culler is working at this point, at least reliably. My pods stay running for weeks with no interaction or jobs.

Hi, I've just created a new account (WikiLabMadrid) for giving a course on pywikibot to our local Spanish chapter. I wanted to show how to start using PAWS from scratch and, as my regular server is full of Python packages, created a new account. However, I'm undergoing the same problems as Herzi pointed out one month ago. Any way to sort this out?

Mentioned in SAL (#wikimedia-cloud) [2018-02-17T22:46:27Z] <zhuyifei1999_> # find /data/project/paws/userhomes/ -maxdepth 1 -user root | xargs chown -v tools.paws:tools.paws. Affected: 17295220 & 53267907 T185434

However, I'm undergoing the same problems as Herzi pointed out one month ago. Any way to sort this out?

We are working on fixing the root cause of directories being created with incorrect owner. In the meantime, your user should work now.

I renamed the hack script in tools.paws to paws-userhomes-hack.bash, it now looks like:

#!/bin/bash
while true
do
	find /data/project/paws/userhomes/ -maxdepth 1 -user root | xargs -L1 chown -v tools.paws:tools.paws
	sleep 1
done

And submitted it to the grid as root on tools-bastion with jsub /data/project/paws/paws-userhomes-hack.bash

root@tools-bastion-03:~# qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
1308410 0.30000 paws-userh root         r     02/18/2018 00:43:19 task@tools-exec-1406.eqiad.wmf     1

That should hopefully help us stop doing this manually for now.

Thanks, my new user is able to create a new server. However, I don't know whether the task is definitelly fixed or not. Thanks again

I've tried to create a server for a new user and it has worked. Hopefully it's definitely fixed :-)

Thanks, guys

Just tried to create a user for myself and it fails. It keeps waiting on https://paws.wmflabs.org/paws/hub/user/barcex/ saying "Your server is starting up. You will be redirected automatically when it's ready for you" until it eventually times out.

Just tried to create a user for myself and it fails. It keeps waiting on https://paws.wmflabs.org/paws/hub/user/barcex/ saying "Your server is starting up. You will be redirected automatically when it's ready for you" until it eventually times out.

And I have this problem.

I cannot reproduce that with my own account, but there is no pod, afaict, for either of you running. The hub logs is full of culler errors (once error every minute) which @Chicocvenancio is looking into, and I gave up searching for relevant error messages related to timeout.

In T185434#3990743, @Zoranzoki21 wrote:

And I have this problem.

I'm not sure I understand. I see your userhome in PAWS, what is the error you are getting?
See your errors in the log as well, seems to be the same for both. Can you try to clear cookies and retry as well?

Just tried to create a user for myself and it fails. It keeps waiting on https://paws.wmflabs.org/paws/hub/user/barcex/ saying "Your server is starting up. You will be redirected automatically when it's ready for you" until it eventually times out.

As zhuyifei1999, I suspect this to be a different issue. But I do see the 500s in the log, I'll investigate a bit and report back.
(forgot to submit)

Looking at the hub image log for the last few hours we can see some errors for barcex, the most relevant one seems to be:

[E 2018-02-21 23:37:17.927 JupyterHub gen:914] Exception in Future <tornado.concurrent.Future object at 0x7eff24281da0> after timeout
Traceback (most recent call last):

File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 910, in error_callback
  future.result()
File "/usr/local/lib/python3.5/dist-packages/jupyterhub/handlers/base.py", line 445, in finish_user_spawn
  yield spawn_future
File "/usr/local/lib/python3.5/dist-packages/jupyterhub/user.py", line 439, in spawn
  raise e
File "/usr/local/lib/python3.5/dist-packages/jupyterhub/user.py", line 378, in spawn
  ip_port = yield gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)

tornado.gen.TimeoutError: Timeout

This seems very distinct from the userhome permission problem that started this task, but will result in similar symptoms from an user perspective. Maybe we should specify a different task for each root cause and keep this as a tracker for all errors until we create new tasks.

I'll try to understand this error a bit more the next few days, in the meantime @Barcex, could you try to remove your cookies and go to https://paws.wmflabs.org again? Do keep in mind that refreshes are not asking for new servers, only if you click home and then my server does PAWS try to provide you with a fresh container. Refreshes only check the status if a container is ready (and will retry if creation has not failed for over 300 seconds).

A couple of tests (with different browsers and fresh cookies):
Launched 2018-02-22 09:53 UTC
Launched 2018-02-22 10:15 UTC

Chicocvenancio lowered the priority of this task from High to Medium.Feb 22 2018, 4:48 PM
Chicocvenancio updated the task description. (Show Details)

A couple of tests (with different browsers and fresh cookies):
Launched 2018-02-22 09:53 UTC
Launched 2018-02-22 10:15 UTC

Same and for me is as this.

Chicocvenancio claimed this task.

With both subtasks closed, closing this as well.