⚓ T163796 Audit disk usage on labvirts

	Subject	Repo	Branch	Lines +/-
	novastats: Add 'diskspace.py' script	operations/puppet	production	+164 -0

Andrew created this task.Apr 25 2017, 3:43 PM

Restricted Application added a project: Cloud-Services. · View Herald TranscriptApr 25 2017, 3:43 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

• chasemp triaged this task as High priority.Apr 25 2017, 3:55 PM

In our config, we have disk_allocation_ratio=1.5

I'm reading the 'disk filter' section of https://docs.openstack.org/liberty/config-reference/content/section_compute-scheduler.html, and here is how I think the story goes:

Instances are allocated with a potential size (e.g. 40Gb for a medium instance) but with an initial 20Gb partitioned. A smaller fraction of that 20Gb is actually consumed -- for a brand new instance, less than 1Gb initially. With normal use, disk consumption will grow over time, potentially hitting that 20Gb limit. For instances that partition the rest of their space, they may eventually grow to consume the entire potential size (40Gb in our example.)

When nova looks for a host to schedule a new instance on, it adds up the potential size of all instances on a given node. It then adds the potential size of the new candidate node. If the sum total of all those numbers are < 1.5x the total size of the /var/lib/nova/instances partition on the labvirt, the labvirt is considered a candidate for scheduling.

As far as I know, nova doesn't pay attention to free space on the host at all, only the total drive size.

So, things to audit here are:

am I right about how nova schedules?

what % of hosts have potential size > 20Gb, and what % of /those/ hosts don't ever partition their additional space? That gives us a number for space that nova sees when scheduling but will never be consumed.

what is the average actual disk space consumed by instances? Ideally this would be broken down by flavor, and also by whether or not they partitioned their additional space

what is the potential vs. actual disk usage on each of our labvirts?

That should get us started.

For a quick start, I'm attaching a spreadsheet:

usagereport.csv30 KBDownload

this shows instance allocated usage (from the instance flavor) next to actual instance disk space usage (via 'du' on the labvirts.) It shows us actually consuming about 50% of allocated disk space across all instances. For instances of flavor 'small' (that is, instances where all of the allocated disk space is partitioned from the get-go) our usage ratio is much higher: almost 77%. About 40% of all instances are flavor 'small'.

Andrew added subscribers: • chasemp, bd808, • madhuvishy.Apr 27 2017, 4:45 AM

Here's the current state of instance storage on each labvirt. Note that labvirt1010-14 have much larger drives than the other servers, but don't otherwise have substantially more RAM or cpus, so we'd expect usage % to be lower on those systems.

labvirt1001.eqiad.wmnet:

/dev/sdb1       2.2T  381G  1.9T  18% /var/lib/nova/instances

labvirt1002.eqiad.wmnet:

/dev/sdb1       2.2T  1.2T  1.1T  53% /var/lib/nova/instances

labvirt1003.eqiad.wmnet:

/dev/sdb1       2.2T  1.5T  748G  67% /var/lib/nova/instances

labvirt1004.eqiad.wmnet:

/dev/sdb1       2.2T  1.4T  810G  64% /var/lib/nova/instances

labvirt1005.eqiad.wmnet:

/dev/sdb1       2.2T  1.5T  770G  66% /var/lib/nova/instances

labvirt1006.eqiad.wmnet:

/dev/sdb1       2.2T  1.5T  764G  66% /var/lib/nova/instances

labvirt1007.eqiad.wmnet:

/dev/sdb1       2.2T  1.5T  745G  67% /var/lib/nova/instances

labvirt1008.eqiad.wmnet:

/dev/sdb1       2.2T  1.8T  485G  79% /var/lib/nova/instances

labvirt1009.eqiad.wmnet:

/dev/sdb1       2.2T  1.6T  627G  72% /var/lib/nova/instances

labvirt1010.eqiad.wmnet:

/dev/mapper/tank-data  4.1T  1.4T  2.7T  34% /var/lib/nova/instances

labvirt1011.eqiad.wmnet:

/dev/mapper/tank-data  4.1T  1.7T  2.5T  41% /var/lib/nova/instances

labvirt1012.eqiad.wmnet:

/dev/mapper/tank-data  4.1T  2.1T  2.0T  52% /var/lib/nova/instances

labvirt1013.eqiad.wmnet:

/dev/mapper/tank-data  4.1T  1.9T  2.2T  47% /var/lib/nova/instances

labvirt1014.eqiad.wmnet:

/dev/mapper/tank-data  4.1T   90G  4.0T   3% /var/lib/nova/instances

Worst case scenario: What if every one of those instances suddenly partitioned and filled every bit of allocated space? On average, consumed space would double.

Labvirt1001, 1010, 1011, 1013 would be fine; the others would probably choke at some point.

That spreadsheet is also available as https://docs.google.com/spreadsheets/d/1TRimo0kT_YzlXl_RD3Z7zOZHdj5Piev31ALIKku7Y8g

In T163796#3217927, @Andrew wrote:

Worst case scenario: What if every one of those instances suddenly partitioned and filled every bit of allocated space? On average, consumed space would double.

Labvirt1001, 1010, 1011, 1013 would be fine; the others would probably choke at some point.

The usage report has a breakdown of uuid x allocated x consumed and this comment has a breakdown of free space by labvirt and this indicates scheduling is based on the 1.5x value of each labvirt solely and not usage.

Can we break down the usage by labvirt values (and really I'm hoping I can help do it so it's just on you plate) where they are: labvirt, uuid, instance allocated storage, instance used storage, instance used storage %, [y/n] for has provisioned more than initial partition, allocated % of actual disk on labvirt, consumed % of actual disk on labvirt, allocated % of 1.5x disk, consumed % of 1.5 disk?

I don't feel comforted by the 50% used of allocated for a few reasons:

With COW will 100% usage on any labvirt render all instances there non-functional?

It doesn't tie into actual labvirt allocated and consumed % existing. We have 50% overhead within allocated but what % of allocated is 1.5 and 1.0 disk space (and by labvirt)? We can only think in allocation blocks that are labvirt sized so we'll lose some % of this already.

At 50% of allocated (and not consumed) why do we do the 1.5x scheduling?

On what percentage of instance disks are we using space outside of initial partition? i.e. we partitioned the initial 20G and some % of instances grow and use beyond that.

We should not count labvirt1014 in this at all as it skewes the numbers a lot.

From nova:

labvirt
uuid
instance allocated storage
instance used storage

from puppet config:

[y/n] for has provisioned more than initial partition

spreadsheet calcs:

instance used storage %
allocated % of actual disk on labvirt
consumed % of actual disk on labvirt
allocated % of 1.5x disk, consumed % of 1.5 disk?

Thanks @Andrew, I'm trying make sense of this :)

I think we can reason backwards from the disk_available_least value for some things on allocated vs available. It seems like nova should also be keeping track in such a way that we don't need to do df -Th and also can make scheduling decisions based on usage (to say nothing of alerting and heuristics) . It looks like DiskFilter always uses disk_available_least. In looking through existing stats I'm coming up with a few questions for myself. Most of this I see determined by python-nova: /usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py

nova hypervisor-stats

+----------------------+---------+
| Property             | Value   |
+----------------------+---------+
| count                | 14      |
| current_workload     | 1       |
| disk_available_least | 2931    |
| free_disk_gb         | 4080    |
| free_ram_mb          | 1604473 |
| local_gb             | 40880   |
| local_gb_used        | 36800   |

disk_available_least - this is the space available if all instances are at full usage. i.e. assume COW has eaten all it can to flesh out instances on disk. This can be a negative number to indicate that a labvirt disk is overcommitted already if the ratio is >1.0 as we have it set at 1.5. In theory this should never be negate a number that is >50% total actual disk available. So if a Disk is 10G, we oversubscribe 1.5, this should never be more than -15G.

free_disk_gb - The formula I see is free_disk_gb = resources['local_gb'] - resources['local_gb_used'].

If I look at openstack hypervisor list I can grab some statistics per virt.

root@labcontrol1001:~# openstack   hypervisor list
+----+-------------------------+
| ID | Hypervisor Hostname     |
+----+-------------------------+
| 13 | labvirt1001.eqiad.wmnet |
| 14 | labvirt1004.eqiad.wmnet |
| 15 | labvirt1003.eqiad.wmnet |
| 16 | labvirt1002.eqiad.wmnet |
| 17 | labvirt1005.eqiad.wmnet |
| 18 | labvirt1006.eqiad.wmnet |
| 19 | labvirt1007.eqiad.wmnet |
| 20 | labvirt1008.eqiad.wmnet |
| 21 | labvirt1009.eqiad.wmnet |
| 22 | labvirt1010.eqiad.wmnet |
| 23 | labvirt1011.eqiad.wmnet |
| 24 | labvirt1012.eqiad.wmnet |
| 25 | labvirt1013.eqiad.wmnet |
| 26 | labvirt1014.eqiad.wmnet |
+----+-------------------------+

Labvirt1014

/dev/mapper/tank-data xfs 4.1T 92G 4.0T 3% /var/lib/nova/instances

root@labcontrol1001:~# openstack hypervisor show 26 | grep disk

disk_available_least	4012
free_disk_gb	4017
local_gb	4157
local_gb_used	140

vs.

Labvirt1013

/dev/mapper/tank-data xfs 4.1T 1.9T 2.2T 47% /var/lib/nova/instances

root@labcontrol1001:~# openstack hypervisor show 25 | grep disk

disk_available_least	777
free_disk_gb	837
local_gb	4157
local_gb_used	3320

vs

Labvirt1001

/dev/sdb1 xfs 2.2T 377G 1.9T 17% /var/lib/nova/instances

root@labcontrol1001:~# openstack hypervisor show 13 | grep -e disk -e local

disk_available_least	883
free_disk_gb	913
local_gb	2233
local_gb_used	1320

vs

Labvirt1002

/dev/sdb1 xfs 2.2T 1.2T 1.1T 54% /var/lib/nova/instances

disk_available_least	-454
free_disk_gb	-367
local_gb	2233
local_gb_used	2600

It seems as if free_disk_gb should /never/ be negative and yet in the case of labvirt1002 it is. I assume this is because local_gb_used is 2600

In the case of Labvirt1013 local_gb_used is 140 yet the partition is using 1.9T.

Is free_disk_gbmeant as an aggregate at the nova hypervisor-stats level? It seems like it cannot be correct showing 4080 considering just labvirt1014 has free_disk_gb 4017.

Andrew lowered the priority of this task from High to Medium.May 8 2017, 5:36 PM

Change 357014 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] novastats: Add 'diskspace.py' script

https://gerrit.wikimedia.org/r/357014

gerritbot added a project: Patch-For-Review.Jun 3 2017, 3:37 PM

Paladox subscribed.Jun 3 2017, 3:52 PM

Change 357014 merged by Andrew Bogott:
[operations/puppet@production] novastats: Add 'diskspace.py' script

https://gerrit.wikimedia.org/r/357014

bd808 added a project: cloud-services-team (Kanban).Jun 6 2017, 8:57 PM

Andrew removed Andrew as the assignee of this task.Jun 13 2017, 8:10 PM