Page MenuHomePhabricator

deployment-maps01 has no security groups, none can be added
Closed, ResolvedPublic


About 30 minutes ago I did the following things (all in Horizon):

  • Tried to create a "maps" security group in deployment-prep. This failed because the quota was reached
  • Edited the "sca" security group to add port 6533
  • Edited the deployment-maps01 instance to add the sca security group

After doing this, deployment-maps01 ended up having no security groups at all (it previously had default and web). Trying to edit its groups shows successfully edited "unknown instance" and the changes don't stick. Because the instance is no longer in the default security group, SSHing into it doesn't work any more.

Event Timeline

@bd808 kindly raised the security group quota for deployment-prep, so I was able to create a maps security group and undo my change to the sca group. However, the issue with deployment-maps01 remains: it has no groups and trying to add any fails.

CLI command confirms no security groups attached initially:

# OS_TENANT_NAME=deployment-prep openstack server show 32545986-1556-4a07-87e7-79c7a607acf8
| Field                                | Value                                                     |
| OS-DCF:diskConfig                    | AUTO                                                      |
| OS-EXT-AZ:availability_zone          | nova                                                      |
| OS-EXT-STS:power_state               | 1                                                         |
| OS-EXT-STS:task_state                | None                                                      |
| OS-EXT-STS:vm_state                  | active                                                    |
| OS-SRV-USG:launched_at               | 2018-03-13T01:32:56.000000                                |
| OS-SRV-USG:terminated_at             | None                                                      |
| accessIPv4                           |                                                           |
| accessIPv6                           |                                                           |
| addresses                            | public=                                        |
| config_drive                         |                                                           |
| created                              | 2018-03-13T01:32:34Z                                      |
| flavor                               | m1.large (4)                                              |
| hostId                               | f43b7a9850d989d23c1e77bb915a484664f43b559c4022dc37c8e703  |
| id                                   | 32545986-1556-4a07-87e7-79c7a607acf8                      |
| image                                | debian-8.10-jessie (fd953a8c-bf2b-4b78-9b60-130d3316dcfd) |
| key_name                             | None                                                      |
| name                                 | deployment-maps01                                         |
| os-extended-volumes:volumes_attached | []                                                        |
| progress                             | 0                                                         |
| project_id                           | deployment-prep                                           |
| properties                           |                                                           |
| status                               | ACTIVE                                                    |
| updated                              | 2018-03-13T01:32:57Z                                      |
| user_id                              | catrope                                                   |

I used Horizon to add and remove security groups on another instance in another project so the problem at least does not initially seem to be systemic.

Mentioned in SAL (#wikimedia-operations) [2018-03-22T00:02:02Z] <andrewbogott> restarted nova-network on labnet1001 and nova-compute on labvirt1015 as part of debugging T190367

The other issue here is that in modern openstack versions security groups are handled by Neutron. That means that support has already started to rot in the versions we're running :(

Here is the current state of things:

  • This instance has two security groups, 'default' and 'maps' applied
  • Those security groups appear as expected both on the commandline and in horizon
  • Those security groups are in effect, such that the ports in each group (including 6533) are not blocked
  • Editing security groups via Horizon fails, as does removing specific security groups via the commandline
  • There are also local, puppet-applied firewall rules which block all ports but port 22. That means that 6533 is still effectively blocked on the instance, but ferm/iptables rules are outside the scope of this bug :)

So, I'm tempted to declare this 'good enough'. Someone (probably roan) will need to sort out the local firewall issue. There are also some obvious issues with security groups in our infrastructure, but since we're right on the verge of changing everything (updating to Mitaka and Neutron) I don't think it's very useful to dig in to this particular issue unless we find more widespread problems.

The same thing is now happening on tin.rcm.eqiad.wmflabs. There might be two different issues here :(

There are a lot of problems here -- this is clearly a neglected feature in nova.

  • the cmdline client on labcontrol1001 is wrong and fails when trying to remove rules.
  • the cmdline client on labweb1001 is wrong in a different way (it sends an ID which the api interprets as a name and errors out)

The cmdline client in the horizon venv looks correct to me and, I'm sorry to say, labtesthorizon works properly.

The issue seems to be name collisions -- there are many security groups named 'default' in different projects, and somehow 'default' from the admin project is getting assigned to VMs. Horizon can't see that group, so it shows the instances as having no security groups at all.

Change 421974 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[openstack/horizon/horizon@ocata] service groups: don't pass a map object to server_update_security_groups

Change 421974 merged by Andrew Bogott:
[openstack/horizon/horizon@ocata] service groups: don't pass a map object to server_update_security_groups

Change 421983 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[openstack/horizon/deploy@ocata] Update horizon submodule

Change 421983 merged by Andrew Bogott:
[openstack/horizon/deploy@ocata] Update horizon submodule

I believe this is fixed but I'd appreciate a second opinion from @Catrope