Page MenuHomePhabricator

openstack: consider removing references to old hardware from the database
Open, LowPublic

Description

We still have references to old (already decommissioned) hardware somewhere in the database, as the prometheus-openstack-exporter reports data for them.

Example:

aborrero@cloudcontrol1007:~$ curl localhost:12345/metrics -o metrics.prom
aborrero@cloudcontrol1007:~$ grep cloudvirt1001 metrics.prom
openstack_placement_resource_allocation_ratio{hostname="cloudvirt1001.eqiad.wmnet",resourcetype="DISK_GB"} 1.5
openstack_placement_resource_allocation_ratio{hostname="cloudvirt1001.eqiad.wmnet",resourcetype="MEMORY_MB"} 1
openstack_placement_resource_allocation_ratio{hostname="cloudvirt1001.eqiad.wmnet",resourcetype="VCPU"} 4
openstack_placement_resource_reserved{hostname="cloudvirt1001.eqiad.wmnet",resourcetype="DISK_GB"} 0
openstack_placement_resource_reserved{hostname="cloudvirt1001.eqiad.wmnet",resourcetype="MEMORY_MB"} 512
openstack_placement_resource_reserved{hostname="cloudvirt1001.eqiad.wmnet",resourcetype="VCPU"} 0
openstack_placement_resource_total{hostname="cloudvirt1001.eqiad.wmnet",resourcetype="DISK_GB"} 2015
openstack_placement_resource_total{hostname="cloudvirt1001.eqiad.wmnet",resourcetype="MEMORY_MB"} 386952
openstack_placement_resource_total{hostname="cloudvirt1001.eqiad.wmnet",resourcetype="VCPU"} 48
openstack_placement_resource_usage{hostname="cloudvirt1001.eqiad.wmnet",resourcetype="DISK_GB"} 52
openstack_placement_resource_usage{hostname="cloudvirt1001.eqiad.wmnet",resourcetype="MEMORY_MB"} 2048
openstack_placement_resource_usage{hostname="cloudvirt1001.eqiad.wmnet",resourcetype="VCPU"} 4

This is likely somewhere in the placement database, but I couldn't find where:

aborrero@cloudcontrol1007:~$ sudo mysql -u root
MariaDB [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| cinder             |
| designate          |
| eqiad1_ceph_backy  |
| eqiad1_heat        |
| eqiad1_magnum      |
| glance             |
| information_schema |
| keystone           |
| mysql              |
| neutron            |
| nova_api_eqiad1    |
| nova_cell0_eqiad1  |
| nova_eqiad1        |
| performance_schema |
| placement          |
| trove_eqiad1       |
+--------------------+
16 rows in set (0.001 sec)

MariaDB [(none)]> use placement;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [placement]> show tables;
+------------------------------+
| Tables_in_placement          |
+------------------------------+
| alembic_version              |
| allocations                  |
| consumer_types               |
| consumers                    |
| inventories                  |
| placement_aggregates         |
| projects                     |
| resource_classes             |
| resource_provider_aggregates |
| resource_provider_traits     |
| resource_providers           |
| traits                       |
| users                        |
+------------------------------+
13 rows in set (0.000 sec)

However, at least some parts of openstack knows that these hosts don't exists:

aborrero@cloudcontrol1007:~$ sudo wmcs-openstack hypervisor list | grep cloudvirt1001
[.. nothing ..]

The impact is just cosmetic. We get some panels with empty data in grafana which is a bit annoying but also harmless.

Related Objects

Event Timeline

aborrero created this task.

Following up from T340611, my next best guess is that the openstack exporter performs some caching? That seems likely if the OS API returns correct data (i.e. no old hosts)

Following up from T340611, my next best guess is that the openstack exporter performs some caching? That seems likely if the OS API returns correct data (i.e. no old hosts)

I couldn't find such cache. I suspect of the DB because I'm not aware of any procedure we do to cleanup it when we decommission hardware.

Following up from T340611, my next best guess is that the openstack exporter performs some caching? That seems likely if the OS API returns correct data (i.e. no old hosts)

I couldn't find such cache. I suspect of the DB because I'm not aware of any procedure we do to cleanup it when we decommission hardware.

Yeah that must be it then; I'm definitely out of my depth here obviously, but happy to help with the Prometheus side of things if needed