Page MenuHomePhabricator

contintcloud project thinks it is using 206 fixed-ip quota errantly
Closed, ResolvedPublic

Description

There was an issue with labnet1001 logs filling up resulting in API actions returning:

| fault                                | {"message": "[Errno 28] No space left on device", "code": 500, "details": "  File \"/usr/lib/python2.7/dist-packages/nova/compute/manager.py\", line 366, in decorated_function |
|                                      |     return function(self, context, *args, **kwargs)

This is what showed up in 'nova show <instance>' under fault.

I cleaned up /var/log/ on labnet1001 and things were then fine everywhere /except/ within the contintcloud project where a new error surfaced of:

| fault                                | {"message": "Build of instance a2455b6b-284a-4e2c-8959-22ba13cb23a5 aborted: Failed to allocate the network(s) with error Maximum number of fixed ips exceeded, not rescheduling.", "code": 500, "details": "  File \"/usr/lib/python2.7/dist-packages/nova/compute/manager.py\", line 1905, i

At the time:

root@labcontrol1001:~# openstack quota show contintcloud
+----------------------+--------------+
| Field                | Value        |
+----------------------+--------------+
| cores                | 44           |
| fixed-ips            | 200          |
| floating_ips         | 0            |
| injected-file-size   | 10240        |
| injected-files       | 5            |
| injected-path-size   | 255          |
| instances            | 23           |
| key-pairs            | 100          |
| project              | contintcloud |
| properties           | 128          |
| ram                  | 102400       |
| secgroup-rules       | 20           |
| secgroups            | 10           |
| server_group_members | 10           |
| server_groups        | 10           |
+----------------------+--------------+

believing openstack was mistaken about contintcloud quota usage and was therefore denying actions inappropriately I tried to bump the fixed-ip limit up to diagnose:

root@labcontrol1001:~# openstack quota set contintcloud --fixed-ips 250

Then:

root@labcontrol1001:~# openstack quota show contintcloud
+----------------------+--------------+
| Field                | Value        |
+----------------------+--------------+
| cores                | 44           |
| fixed-ips            | 250          |
| floating_ips         | 0            |
| injected-file-size   | 10240        |
| injected-files       | 5            |
| injected-path-size   | 255          |
| instances            | 23           |
| key-pairs            | 100          |
| project              | contintcloud |
| properties           | 128          |
| ram                  | 102400       |
| secgroup-rules       | 20           |
| secgroups            | 10           |
| server_group_members | 10           |
| server_groups        | 10           |
+----------------------+--------------+

That got things going again for nodepool seemingly as normal. After 20 minutes or so I wanted to reset to the old limit. Horizon was correctly reporting 12 instances in use. Based on nova docs and code and this blog entry

"If the quota is already (incorrectly) too high and exceeds the quota limit, the reservation that triggers the refresh will still fail. I.e. the reservation is attempted based on the quota usage value before the refresh."

root@labcontrol1001:~# openstack quota set contintcloud --fixed-ips 200

Quota limit 200 for fixed_ips must be greater than or equal to already used and reserved 206. (HTTP 400) (Request-ID: req-60e06084-eeb3-4239-9da1-8ecec2e3bbe3

For whatever reason openstack still believes the contintcloud project is above the 200 fixed-ip threshold.

Details

Related Gerrit Patches:
operations/puppet : productionNova: Turn off Verbose logging

Event Timeline

chasemp created this task.Feb 16 2017, 8:04 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 16 2017, 8:04 PM
chasemp assigned this task to Andrew.Feb 16 2017, 8:04 PM
chasemp triaged this task as High priority.
chasemp added a subscriber: Andrew.

currently nodepool is going along fine except the quota is clearly wrong. I don't yet understand why the current max_age setting of 30 isn't correcting this.

Handing off to @Andrew so we can talk when he's around

thanks for troubleshooting -- I'll dig in the source and try to see how it's computing that quota count.

Usually you can force quota recalculation with

MariaDB [nova]> select * from quota_usages where project_id='contintcloud';

In this case, though, even after the recalculation the value is still wrong. So I need to figure out how the calculation happens...

I restarted nova-network and it looks like nova is cleaning up those leaks now. I'll keep an eye out, but I've reduced the quota to 200 and there's some slack now.

Change 338397 had a related patch set uploaded (by Andrew Bogott):
Nova: Turn off Verbose logging

https://gerrit.wikimedia.org/r/338397

Change 338397 merged by Andrew Bogott:
Nova: Turn off Verbose logging

https://gerrit.wikimedia.org/r/338397

Andrew closed this task as Resolved.Feb 17 2017, 7:05 PM

I cleaned up about 100 leaks, like this:

update fixed_ips a, instances b set a.instance_uuid=NULL where a.instance_uuid = b.uuid and project_id='contintcloud' and b.deleted!='0';

After that, unowned ips in contintcloud are staying in the single-digits an seem to be getting cleaned up regularly.