Page MenuHomePhabricator

Creating projects in openstack hangs on CLI
Closed, InvalidPublic

Description

After trying to create projects for: T213283, T212573 and T212918, I've had some success and some not-so-much. All runs yesterday and today have hung. However, where the wikidata-history-query-service and indico projects appear to have completed most or all of the wmfkeystonehooks material, meza and bstorm-test2 failed to end up with a domain from designate.

On digging in, this is hanging during the POST to http://cloudcontrol1003.wikimedia.org:35357/v3/projects, so the error must be on one of the backend processes.

I can confirm so far that the nova-api logs note that we have an error in our code around the security group modification: novaadmin bstorm-test2 - - -] HTTP exception thrown: Invalid port range 1:65535. For ICMP, the type:code must be valid That seems an easy fix, but I cannot imagine that is the root cause since that's from 11 months ago.

I closed T212573, since it appears to be complete, but I'm keeping the others open until this is resolved. Things aren't quite right.

Event Timeline

Bstorm triaged this task as High priority.Jan 17 2019, 10:52 PM
Bstorm created this task.

Background: Our custom project-creation code is in a module called 'wmfkeystonehooks'. The last thing the hooks do do is create a default domain <projectname>.wmflabs.org. So, a reasonable way to verify that project creation completed is checking that that domain exists and is scoped to the new project.

I just now created three new projects, like this:

root@cloudcontrol1003:~# openstack project create testproject3

All three completed in a few seconds, and all three have the custom domain.

So... I don't know what you're seeing. My guesses are either 1) something is different in the context when you're running the commands or 2) my restarting keystone to turn on logging somehow resolved the problem.

In any case, I'm going to try to clean up some of the existing errors and warnings that crop up during project creation so we can get a cleaner view in the future.

Change 486084 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmfkeystonehooks: update keystone auth

https://gerrit.wikimedia.org/r/486084

Change 486085 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmfkeystonehooks: add some more logging

https://gerrit.wikimedia.org/r/486085

Change 486089 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmfkeystonehooks: use page.text() instead of page.edit()

https://gerrit.wikimedia.org/r/486089

Change 486090 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] designatemakedomain: update keystone auth

https://gerrit.wikimedia.org/r/486090

Change 486084 merged by Andrew Bogott:
[operations/puppet@production] wmfkeystonehooks: update keystone auth

https://gerrit.wikimedia.org/r/486084

Change 486085 merged by Andrew Bogott:
[operations/puppet@production] wmfkeystonehooks: add some more logging

https://gerrit.wikimedia.org/r/486085

Change 486089 merged by Andrew Bogott:
[operations/puppet@production] wmfkeystonehooks: use page.text() instead of page.edit()

https://gerrit.wikimedia.org/r/486089

Change 486090 merged by Andrew Bogott:
[operations/puppet@production] designatemakedomain: update keystone auth

https://gerrit.wikimedia.org/r/486090

I'm assigning this back to @Bstorm to try again -- we'll at least get some logs of where it dies now. (And, btw, if you were doing this on labcontrol1001, that might have been part of the problem, so try on labcontrol1003 instead).

We think that the issue may be that Brooke was specifying region on creation. Taking this back to investigate that angle.

Created meza without specifying the region, and it was fine, just FYI.

I also cleaned up bstorm-test2 so it's not hanging around forever.

Mentioned in SAL (#wikimedia-cloud) [2019-02-06T00:17:22Z] <bstorm_> T214106 deleted bstorm-test2 project to clean up

@Bstorm can you tell me what exact command was hanging? Apparently it wasn't with --region

root@cloudcontrol1003:~# openstack project create --region eqiad1-r andrewtestproject101
usage: openstack project create [-h] [-f {json,shell,table,value,yaml}]
                                [-c COLUMN] [--max-width <integer>]
                                [--noindent] [--prefix PREFIX]
                                [--domain <domain>] [--parent <project>]
                                [--description <description>]
                                [--enable | --disable]
                                [--property <key=value>] [--or-show]
                                <project-name>
openstack project create: error: unrecognized arguments: --region andrewtestproject101

Seems this was a one-time weirdness