Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
novastats: Add 'diskspace.py' script | operations/puppet | production | +164 -0 |
Details
Event Timeline
In our config, we have disk_allocation_ratio=1.5
I'm reading the 'disk filter' section of https://docs.openstack.org/liberty/config-reference/content/section_compute-scheduler.html, and here is how I think the story goes:
Instances are allocated with a potential size (e.g. 40Gb for a medium instance) but with an initial 20Gb partitioned. A smaller fraction of that 20Gb is actually consumed -- for a brand new instance, less than 1Gb initially. With normal use, disk consumption will grow over time, potentially hitting that 20Gb limit. For instances that partition the rest of their space, they may eventually grow to consume the entire potential size (40Gb in our example.)
When nova looks for a host to schedule a new instance on, it adds up the potential size of all instances on a given node. It then adds the potential size of the new candidate node. If the sum total of all those numbers are < 1.5x the total size of the /var/lib/nova/instances partition on the labvirt, the labvirt is considered a candidate for scheduling.
As far as I know, nova doesn't pay attention to free space on the host at all, only the total drive size.
So, things to audit here are:
- am I right about how nova schedules?
- what % of hosts have potential size > 20Gb, and what % of /those/ hosts don't ever partition their additional space? That gives us a number for space that nova sees when scheduling but will never be consumed.
- what is the average actual disk space consumed by instances? Ideally this would be broken down by flavor, and also by whether or not they partitioned their additional space
- what is the potential vs. actual disk usage on each of our labvirts?
That should get us started.
For a quick start, I'm attaching a spreadsheet:
this shows instance allocated usage (from the instance flavor) next to actual instance disk space usage (via 'du' on the labvirts.) It shows us actually consuming about 50% of allocated disk space across all instances. For instances of flavor 'small' (that is, instances where all of the allocated disk space is partitioned from the get-go) our usage ratio is much higher: almost 77%. About 40% of all instances are flavor 'small'.
Here's the current state of instance storage on each labvirt. Note that labvirt1010-14 have much larger drives than the other servers, but don't otherwise have substantially more RAM or cpus, so we'd expect usage % to be lower on those systems.
labvirt1001.eqiad.wmnet:
/dev/sdb1 2.2T 381G 1.9T 18% /var/lib/nova/instances
labvirt1002.eqiad.wmnet:
/dev/sdb1 2.2T 1.2T 1.1T 53% /var/lib/nova/instances
labvirt1003.eqiad.wmnet:
/dev/sdb1 2.2T 1.5T 748G 67% /var/lib/nova/instances
labvirt1004.eqiad.wmnet:
/dev/sdb1 2.2T 1.4T 810G 64% /var/lib/nova/instances
labvirt1005.eqiad.wmnet:
/dev/sdb1 2.2T 1.5T 770G 66% /var/lib/nova/instances
labvirt1006.eqiad.wmnet:
/dev/sdb1 2.2T 1.5T 764G 66% /var/lib/nova/instances
labvirt1007.eqiad.wmnet:
/dev/sdb1 2.2T 1.5T 745G 67% /var/lib/nova/instances
labvirt1008.eqiad.wmnet:
/dev/sdb1 2.2T 1.8T 485G 79% /var/lib/nova/instances
labvirt1009.eqiad.wmnet:
/dev/sdb1 2.2T 1.6T 627G 72% /var/lib/nova/instances
labvirt1010.eqiad.wmnet:
/dev/mapper/tank-data 4.1T 1.4T 2.7T 34% /var/lib/nova/instances
labvirt1011.eqiad.wmnet:
/dev/mapper/tank-data 4.1T 1.7T 2.5T 41% /var/lib/nova/instances
labvirt1012.eqiad.wmnet:
/dev/mapper/tank-data 4.1T 2.1T 2.0T 52% /var/lib/nova/instances
labvirt1013.eqiad.wmnet:
/dev/mapper/tank-data 4.1T 1.9T 2.2T 47% /var/lib/nova/instances
labvirt1014.eqiad.wmnet:
/dev/mapper/tank-data 4.1T 90G 4.0T 3% /var/lib/nova/instances
Worst case scenario: What if every one of those instances suddenly partitioned and filled every bit of allocated space? On average, consumed space would double.
Labvirt1001, 1010, 1011, 1013 would be fine; the others would probably choke at some point.
That spreadsheet is also available as https://docs.google.com/spreadsheets/d/1TRimo0kT_YzlXl_RD3Z7zOZHdj5Piev31ALIKku7Y8g
The usage report has a breakdown of uuid x allocated x consumed and this comment has a breakdown of free space by labvirt and this indicates scheduling is based on the 1.5x value of each labvirt solely and not usage.
Can we break down the usage by labvirt values (and really I'm hoping I can help do it so it's just on you plate) where they are: labvirt, uuid, instance allocated storage, instance used storage, instance used storage %, [y/n] for has provisioned more than initial partition, allocated % of actual disk on labvirt, consumed % of actual disk on labvirt, allocated % of 1.5x disk, consumed % of 1.5 disk?
I don't feel comforted by the 50% used of allocated for a few reasons:
- With COW will 100% usage on any labvirt render all instances there non-functional?
- It doesn't tie into actual labvirt allocated and consumed % existing. We have 50% overhead within allocated but what % of allocated is 1.5 and 1.0 disk space (and by labvirt)? We can only think in allocation blocks that are labvirt sized so we'll lose some % of this already.
- At 50% of allocated (and not consumed) why do we do the 1.5x scheduling?
- On what percentage of instance disks are we using space outside of initial partition? i.e. we partitioned the initial 20G and some % of instances grow and use beyond that.
- We should not count labvirt1014 in this at all as it skewes the numbers a lot.
From nova:
- labvirt
- uuid
- instance allocated storage
- instance used storage
from puppet config:
- [y/n] for has provisioned more than initial partition
spreadsheet calcs:
- instance used storage %
- allocated % of actual disk on labvirt
- consumed % of actual disk on labvirt
- allocated % of 1.5x disk, consumed % of 1.5 disk?
Thanks @Andrew, I'm trying make sense of this :)
I think we can reason backwards from the disk_available_least value for some things on allocated vs available. It seems like nova should also be keeping track in such a way that we don't need to do df -Th and also can make scheduling decisions based on usage (to say nothing of alerting and heuristics) . It looks like DiskFilter always uses disk_available_least. In looking through existing stats I'm coming up with a few questions for myself. Most of this I see determined by python-nova: /usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py
nova hypervisor-stats
+----------------------+---------+ | Property | Value | +----------------------+---------+ | count | 14 | | current_workload | 1 | | disk_available_least | 2931 | | free_disk_gb | 4080 | | free_ram_mb | 1604473 | | local_gb | 40880 | | local_gb_used | 36800 |
disk_available_least - this is the space available if all instances are at full usage. i.e. assume COW has eaten all it can to flesh out instances on disk. This can be a negative number to indicate that a labvirt disk is overcommitted already if the ratio is >1.0 as we have it set at 1.5. In theory this should never be negate a number that is >50% total actual disk available. So if a Disk is 10G, we oversubscribe 1.5, this should never be more than -15G.
free_disk_gb - The formula I see is free_disk_gb = resources['local_gb'] - resources['local_gb_used'].
If I look at openstack hypervisor list I can grab some statistics per virt.
root@labcontrol1001:~# openstack hypervisor list +----+-------------------------+ | ID | Hypervisor Hostname | +----+-------------------------+ | 13 | labvirt1001.eqiad.wmnet | | 14 | labvirt1004.eqiad.wmnet | | 15 | labvirt1003.eqiad.wmnet | | 16 | labvirt1002.eqiad.wmnet | | 17 | labvirt1005.eqiad.wmnet | | 18 | labvirt1006.eqiad.wmnet | | 19 | labvirt1007.eqiad.wmnet | | 20 | labvirt1008.eqiad.wmnet | | 21 | labvirt1009.eqiad.wmnet | | 22 | labvirt1010.eqiad.wmnet | | 23 | labvirt1011.eqiad.wmnet | | 24 | labvirt1012.eqiad.wmnet | | 25 | labvirt1013.eqiad.wmnet | | 26 | labvirt1014.eqiad.wmnet | +----+-------------------------+
Labvirt1014
/dev/mapper/tank-data xfs 4.1T 92G 4.0T 3% /var/lib/nova/instances
root@labcontrol1001:~# openstack hypervisor show 26 | grep disk
disk_available_least | 4012 |
free_disk_gb | 4017 |
local_gb | 4157 |
local_gb_used | 140 |
vs.
Labvirt1013
/dev/mapper/tank-data xfs 4.1T 1.9T 2.2T 47% /var/lib/nova/instances
root@labcontrol1001:~# openstack hypervisor show 25 | grep disk
disk_available_least | 777 |
free_disk_gb | 837 |
local_gb | 4157 |
local_gb_used | 3320 |
vs
Labvirt1001
/dev/sdb1 xfs 2.2T 377G 1.9T 17% /var/lib/nova/instances
root@labcontrol1001:~# openstack hypervisor show 13 | grep -e disk -e local
disk_available_least | 883 |
free_disk_gb | 913 |
local_gb | 2233 |
local_gb_used | 1320 |
vs
Labvirt1002
/dev/sdb1 xfs 2.2T 1.2T 1.1T 54% /var/lib/nova/instances
disk_available_least | -454 |
free_disk_gb | -367 |
local_gb | 2233 |
local_gb_used | 2600 |
- It seems as if free_disk_gb should /never/ be negative and yet in the case of labvirt1002 it is. I assume this is because local_gb_used is 2600
- In the case of Labvirt1013 local_gb_used is 140 yet the partition is using 1.9T.
- Is free_disk_gbmeant as an aggregate at the nova hypervisor-stats level? It seems like it cannot be correct showing 4080 considering just labvirt1014 has free_disk_gb 4017.
Change 357014 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] novastats: Add 'diskspace.py' script
Change 357014 merged by Andrew Bogott:
[operations/puppet@production] novastats: Add 'diskspace.py' script