Page MenuHomePhabricator

Quota usage not being counted properly in new region
Closed, ResolvedPublic

Description

Horizon and openstack-browser are showing a usage of 9 instances for deployment-prep in eqiad1-r. It should be more like ~68. I imagine the VCPU/RAM calculations are off too. I believe this data comes from the Nova service itself.

Event Timeline

I believe floating IPs and security groups are a different problem - in eqiad1-r this should be from the neutron service instead. Will open a separate task about that.

Found upstream at https://bugs.launchpad.net/nova/+bug/1742826/comments/4:

There is a well-known issue with quotas "going out of sync" in Nova versions Ocata and earlier and is why the 'nova-manage project quota_usage_refresh' command existed. Quotas out-of-sync means that the quota_usages do not match the actual resources being consumed. This can occur due to races while restarting nova-compute, etc.

I also found https://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/refresh-quotas-usage.html and https://blueprints.launchpad.net/nova/+spec/refresh-quotas-usage. I think this "fix" missed mitaka, landed in newton, and was made obsolete in pike, but I'm not 100% certain. I definitely can't find a working nova-manage project quota_usage_refresh or similar on cloudcontrol1003.

It might be worth looking at https://github.com/cernops/nova-quota-sync/blob/master/nova-quota-sync but really I'm holding out hope that this will make more sense in Newton (although the pike fix is the only smart one!)

Change 480656 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] nova: added Cern's nova-quota-sync

https://gerrit.wikimedia.org/r/480656

Change 480656 merged by Andrew Bogott:
[operations/puppet@production] nova: add Cern's nova-quota-sync

https://gerrit.wikimedia.org/r/480656

It might be worth looking at https://github.com/cernops/nova-quota-sync/blob/master/nova-quota-sync but really I'm holding out hope that this will make more sense in Newton (although the pike fix is the only smart one!)

This script seems to set things right. I've installed it as a standard utility on nova control nodes -- time will tell if this is a one-off fix or if things drift out of sync again immediately.

openstack-browser is currently showing 70 instances for deployment-prep. I can't verify Horizon but it seems the issue is solved?

https://tools.wmflabs.org/openstack-browser/project/deployment-prep

Andrew claimed this task.