# Improve algorithm that detects 'spreadiness' of Tool Labs instances on Labs HostsClosed, ResolvedPublicActions

Assigned To
 bd808
Authored By
 yuvipanda Jun 8 2015, 5:09 PM2015-06-08 17:09:57 (UTC+0)
Referenced Files
None

# Description

Currently it just is 'Number of instances of class > unique number of hosts on which the instances are hosted', which is terrible. Get a better metric.

# Related Objects

### Event Timeline

yuvipanda raised the priority of this task from to Needs Triage.
yuvipanda updated the task description. (Show Details)
yuvipanda added subscribers: yuvipanda, Joe, BBlack, chasemp.
Restricted Application added a subscriber: Aklapper. Jun 8 2015, 5:09 PM

I propose the use of entropy to measure uniformity. See http://stats.stackexchange.com/questions/66935/measure-for-the-uniformity-of-a-distribution for a discussion.

```>>> from math import log
>>>
>>> def entropy(counts, epsilon=0.01):
...   total = sum(counts)
...   props = [c/total for c in counts]
...   return sum(p * log(1/max(p, epsilon))
...              for p in props)
...
>>> uniform = [3,3,3,3,3,3,3]
>>> non_uniform = [0,2,0,5,0,2,12]
>>> really_non_uniform = [0,0,0,0,0,1,20]
>>>
>>> entropy(uniform)
1.945910149055313
>>> entropy(non_uniform)
1.1093482433488377
>>> entropy(really_non_uniform)
0.19144408195771734```

If you want to know how much damage a single host going down would cause, then I propose to measure that directly.

```>>> def max_downage(counts):
...   return max(counts)/sum(counts)
...
>>> max_downage(uniform)
0.14285714285714285
>>> max_downage(non_uniform)
0.5714285714285714
>>> max_downage(really_non_uniform)
0.9523809523809523```

It also depends on the kind of host, I think. For failover services, the question is 'how many virt hosts need to go down to make us unreachable', while for exec nodes, I'd ask 'how much of our computing power do we lose if a single virt host goes down'.

So, for failover services, I'd just check that

• N_hosts > given number, and
• each host is on a different virt host

For other services, I'd measure
max(number of hosts per virt host) / total number of hosts

(..which is what @Halfak suggested already because I was too slow in typing)

```if sum(counts) >= 2 and max(counts)/sum(counts) >= 0.5: