Page MenuHomePhabricator

CloudVPS: wrong nova quota usage for a project detected
Closed, ResolvedPublic

Description

The user @Epantaleo detected an incorrect quota in the etytree CloudVPS project.

I could confirm the wrong quota myself:

root@cloudcontrol1004:~# openstack server list --project etytree
+--------------------------------------+-----------+--------+---------------------------------------+-------------------------------------------+
| ID                                   | Name      | Status | Networks                              | Image Name                                |
+--------------------------------------+-----------+--------+---------------------------------------+-------------------------------------------+
| abb0b991-4b48-43df-b568-be1dd3072c8f | etytree-b | ACTIVE | lan-flat-cloudinstances2b=172.16.3.12 | debian-8.6-jessie (deprecated 2017-02-24) |
+--------------------------------------+-----------+--------+---------------------------------------+-------------------------------------------+
root@cloudcontrol1004:~# nova-manage project quota --project etytree
Quota                                Limit      In Use     Reserved  
metadata_items                       128        0          0         
injected_file_content_bytes          10240      0          0         
server_group_members                 10         0          0         
server_groups                        10         0          0         
ram                                  80000      110592     0         
floating_ips                         0          0          0         
security_group_rules                 20         0          0         
instances                            8          3          0         
key_pairs                            100        0          0         
injected_files                       5          0          0         
cores                                18         24         0         
fixed_ips                            unlimited  0          0         
injected_file_path_bytes             255        0          0         
security_groups                      10         1          0     

Investigating, I found this comment by @Andrew T171158#3456299. He suggest a DB write may be required.
I searched other docs, and it seems we have the following command available. I tried several combinations:

root@cloudcontrol1004:~# nova-manage project quota_usage_refresh --project etytree
Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications".
Option "notification_topics" from group "DEFAULT" is deprecated. Use option "topics" from group "oslo_messaging_notifications".
root@cloudcontrol1004:~# nova-manage project quota_usage_refresh --project etytree --user epantaleo
Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications".
Option "notification_topics" from group "DEFAULT" is deprecated. Use option "topics" from group "oslo_messaging_notifications".
root@cloudcontrol1004:~# nova-manage project quota_usage_refresh --project etytree --user Epantaleo
Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications".
Option "notification_topics" from group "DEFAULT" is deprecated. Use option "topics" from group "oslo_messaging_notifications".

Quotas usage are the same, so apparently the quota_usage_refresh command is not doing anything useful. It may be even confusing more the database:

$MariaDB [nova_eqiad1]> select * from quota_usages where project_id = 'etytree';
+---------------------+---------------------+------------+------+------------+-----------------+--------+----------+---------------+---------+-----------+
| created_at          | updated_at          | deleted_at | id   | project_id | resource        | in_use | reserved | until_refresh | deleted | user_id   |
+---------------------+---------------------+------------+------+------------+-----------------+--------+----------+---------------+---------+-----------+
| 2018-12-20 15:19:36 | 2019-01-03 19:00:42 | NULL       | 1207 | etytree    | instances       |      1 |        0 |          NULL |       0 | novaadmin |
| 2018-12-20 15:19:36 | 2019-01-03 19:00:42 | NULL       | 1208 | etytree    | ram             |  36864 |        0 |          NULL |       0 | novaadmin |
| 2018-12-20 15:19:36 | 2019-01-03 19:00:42 | NULL       | 1209 | etytree    | cores           |      8 |        0 |          NULL |       0 | novaadmin |
| 2018-12-20 15:19:36 | 2018-12-20 15:19:36 | NULL       | 1210 | etytree    | security_groups |      1 |        0 |             0 |       0 | novaadmin |
| 2020-01-08 10:53:10 | 2020-01-08 12:15:27 | NULL       | 1603 | etytree    | instances       |      1 |        0 |          NULL |       0 | epantaleo |
| 2020-01-08 10:53:10 | 2020-01-08 12:15:27 | NULL       | 1604 | etytree    | ram             |  36864 |        0 |          NULL |       0 | epantaleo |
| 2020-01-08 10:53:10 | 2020-01-08 12:15:27 | NULL       | 1605 | etytree    | cores           |      8 |        0 |          NULL |       0 | epantaleo |
| 2020-01-09 10:31:30 | 2020-01-09 10:31:30 | NULL       | 1609 | etytree    | fixed_ips       |      0 |        0 |          NULL |       0 | NULL      |
| 2020-01-09 10:31:30 | 2020-01-09 10:31:30 | NULL       | 1610 | etytree    | floating_ips    |      0 |        0 |          NULL |       0 | NULL      |
| 2020-01-09 10:37:24 | 2020-01-09 10:37:24 | NULL       | 1611 | etytree    | cores           |      8 |        0 |          NULL |       0 | Epantaleo |
| 2020-01-09 10:37:24 | 2020-01-09 10:37:24 | NULL       | 1612 | etytree    | instances       |      1 |        0 |          NULL |       0 | Epantaleo |
| 2020-01-09 10:37:24 | 2020-01-09 10:37:24 | NULL       | 1613 | etytree    | ram             |  36864 |        0 |          NULL |       0 | Epantaleo |
| 2020-01-09 10:37:24 | 2020-01-09 10:37:24 | NULL       | 1614 | etytree    | server_groups   |      0 |        0 |          NULL |       0 | Epantaleo |
| 2020-01-09 10:37:24 | 2020-01-09 10:37:24 | NULL       | 1615 | etytree    | security_groups |      0 |        0 |          NULL |       0 | Epantaleo |
| 2020-01-09 10:39:07 | 2020-01-09 10:39:07 | NULL       | 1616 | etytree    | server_groups   |      0 |        0 |          NULL |       0 | epantaleo |
| 2020-01-09 10:39:07 | 2020-01-09 10:39:07 | NULL       | 1617 | etytree    | security_groups |      0 |        0 |          NULL |       0 | epantaleo |
+---------------------+---------------------+------------+------+------------+-----------------+--------+----------+---------------+---------+-----------+
16 rows in set (0.00 sec)

(note the duplicated entries with users different with different string case)

At this point I stopped doing any further modification to seek consensus with the WMCS team on further steps.

Bonus point: not sure if related but just for the record we also have this issue: T242078: CloudVPS: bogus hypervisor stats value reported by nova

Event Timeline

aborrero updated the task description. (Show Details)
aborrero moved this task from Inbox to Soon! on the cloud-services-team (Kanban) board.

Mentioned in SAL (#wikimedia-cloud) [2020-01-09T11:11:20Z] <arturo> running MariaDB [nova_eqiad1]> select * from quota_usages where project_id = 'etytree'; (T242332)

Mentioned in SAL (#wikimedia-cloud) [2020-01-09T11:12:11Z] <arturo> running MariaDB [nova_eqiad1]> update quota_usages set in_use='0' where project_id='etytree'; (T242332)

Mentioned in SAL (#wikimedia-cloud) [2020-01-09T11:18:27Z] <arturo> created VM 'arturo-test' to check T242332

Mentioned in SAL (#wikimedia-cloud) [2020-01-09T11:19:24Z] <arturo> deleted VM 'arturo-test' (T242332)

After my command to set in_use=0, nova allows creating more instances in the project. But the SQL command didn't have the expected result of quota usage being refreshed:

root@cloudcontrol1004:~# nova-manage project quota --project etytree
Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications".
Option "notification_topics" from group "DEFAULT" is deprecated. Use option "topics" from group "oslo_messaging_notifications".
Quota                                Limit      In Use     Reserved  
metadata_items                       128        0          0         
injected_file_content_bytes          10240      0          0         
server_group_members                 10         0          0         
server_groups                        10         0          0         
ram                                  80000      2048       0         
floating_ips                         0          0          0         
security_group_rules                 20         0          0         
instances                            8          1          0         
key_pairs                            100        0          0         
injected_files                       5          0          0         
cores                                18         1          0         
fixed_ips                            unlimited  0          0         
injected_file_path_bytes             255        0          0         
security_groups                      10         0          0    

Hm... I'm still making my way through the backscroll. This seems surely related to the other task but if quota_usage_refresh didn't help then I'm discouraged

This looks better to me now:

nova-manage project quota --project etytree
source ~/novaen
Quota Limit In Use Reserved
metadata_items 128 0 0
injected_file_content_bytes 10240 0 0
server_group_members 10 0 0
server_groups 10 0 0
ram 80000 73728 0
floating_ips 0 0 0
security_group_rules 20 0 0
instances 8 2 0
key_pairs 100 0 0
injected_files 5 0 0
cores 18 16 0
fixed_ips unlimited 0 0
injected_file_path_bytes 255 0 0
security_groups 10 1 0

My understanding is that in some (current? future?) version of OpenStack they did away with caching the project usages, instead recalculating them for each api request. That would most likely resolve the issue but I'm not sure when the change was made.

"This command group will be removed in 17.0.0 (Queens). The quota_usage_refresh subcommand has been deprecated and is now a no-op since quota usage is counted from resources instead of being tracked separately."

Seems promising -- ignoring this for now unless it recurs in Q.