Page MenuHomePhabricator

Move toollabs instances around to minimize damage from a single downed virt* host
Closed, ResolvedPublic

Description

Current hosts + instances they are on:

root@virt1000:/home/yuvipanda# nova  --os-tenant-name=tools list --fields 'host,name'

+--------------------------------------+----------+--------------------------+
| ID                                   | Host     | Name                     |
+--------------------------------------+----------+--------------------------+
| ad12146e-b225-47b2-97f0-330527688331 | virt1001 | tools-exec-06            |
| eb6e8fad-8646-4251-a706-fc90bf0be0c9 | virt1001 | tools-exec-02            |
| cb2940d6-2560-4dc5-9e12-f894efd33dfc | virt1002 | tools-exec-08            |
| fa611e16-6b85-4f74-92a3-2ed1635fa481 | virt1002 | tools-exec-04            |
| 141d6240-9d5c-4991-9e21-e80a371e49ea | virt1003 | tools-trusty             |
| 154a8d9a-4fef-4ea7-bf62-534a32aee1c0 | virt1003 | tools-login              |
| 4222c0f5-b3bd-41a9-94d2-30faad4202ce | virt1003 | tools-exec-01            |
| 4df282d0-1cc3-424d-98ab-22adb7a34277 | virt1003 | tools-master             |
| 6a1a2095-8474-4378-8290-9dece5b9c3d8 | virt1003 | tools-exec-05            |
| 7e79e49d-b540-459b-a59e-3faa934d730e | virt1003 | tools-exec-gift          |
| b8677d01-5c6e-4c97-a1f1-7fdc7dac7f30 | virt1003 | tools-webgrid-03         |
| ec414ae4-a46f-425f-b9d5-950df155f137 | virt1003 | tools-exec-10            |
| 2088bbe3-1f30-48a3-b3ac-6aace58ef160 | virt1004 | tools-webgrid-05         |
| 605caf6e-f642-4cd3-8d42-268fb5e2c612 | virt1004 | tools-dev                |
| 78af272f-f268-4541-932d-63f5c931b31f | virt1004 | tools-exec-wmt           |
| cb0c681e-442a-4c9b-947e-d68a6c1cdaa9 | virt1004 | tools-webgrid-01         |
| dd7fea32-3555-4d67-acdf-f3ff3a7bb80e | virt1004 | tools-webgrid-02         |
| 47608ad4-1adc-4104-b1c5-96281a945ff8 | virt1006 | tools-exec-12            |
| 86049400-ae9e-48e1-bdbd-aac8ac06547d | virt1006 | tools-shadow             |
| 8c499e6e-1b79-4bb1-8f7f-72fee1f74ea5 | virt1006 | tools-mail               |
| 2bcde1d4-b8dd-4bb4-8ab0-6802882f209f | virt1008 | tools-exec-13            |
| a4fc3c84-bc8e-42bf-9209-0549c9872e84 | virt1008 | tools-exec-11            |
| 6f722696-713e-432d-a49f-91199e65c3ef | virt1009 | tools-exec-14            |
| ab61a3c5-39c4-4cd3-b027-2a954cdd8a72 | virt1009 | tools-exec-catscan       |
| b54b9525-635c-4c86-a483-05d4c1c6b36b | virt1009 | tools-exec-15            |
| 5db0983f-2a29-4dde-a543-76038a9dfc4f | virt1010 | tools-webgrid-06         |
| 6f8ead40-b3c9-4abe-86a1-407a410f9843 | virt1010 | tools-uwsgi-01           |
| 7d4a9768-c301-4e95-8bb9-d5aa70e94a64 | virt1010 | tools-webproxy-01        |
| 8d92c507-d253-425d-b7f4-2af3678a39ae | virt1010 | tools-webproxy           |
| 96c37c36-970b-4cc7-a7ba-d1ee90a225b5 | virt1010 | tools-submit             |
| f4eff829-b6f8-4749-b1f3-e881227413c3 | virt1010 | tools-webgrid-generic-01 |
| 1b7e971a-36d8-4d5c-8130-7211a4d00e2e | virt1011 | tools-webproxy-02        |
| 36de792e-c9ec-4bd3-8b5c-6b1bff080d8e | virt1011 | tools-static             |
| fd2dec9a-209d-42aa-b6d1-11497b9f2061 | virt1011 | tools-redis              |
| 120cc401-ed7a-44c5-b905-2d0eae23b6af | virt1012 | tools-exec-03            |
| 30b98f1d-1c5a-49c1-b800-f4c535addc12 | virt1012 | tools-exec-07            |
| 523df61c-07f0-41ba-924d-e2b8e474b4d7 | virt1012 | tools-exec-cyberbot      |
| 5cd684db-d0a6-4241-a11f-daf4c1b2f717 | virt1012 | tools-exec-09            |
| 79aeb31c-a1c1-41af-9e00-df2c7e248924 | virt1012 | tools-webgrid-tomcat     |
| 7fa7d90f-b783-409f-aa46-5cba75283645 | virt1012 | tools-webgrid-generic-02 |
| cdce426b-ef6f-47e7-96e4-bcb3647f4709 | virt1012 | tools-webgrid-04         |
+--------------------------------------+----------+--------------------------+

moving some away from virt1003, and maybe virt 1010 and virt1012 might be a good idea?

Event Timeline

yuvipanda raised the priority of this task from to Needs Triage.
yuvipanda updated the task description. (Show Details)
yuvipanda added subscribers: Ricordisamoa, Andrew, scfc and 3 others.
scfc triaged this task as Low priority.Apr 6 2015, 7:46 AM

This needs a "plan" with rules that can be (perhaps even in the form of a script) checked:

  1. tools-master on different host than tools-shadow.
  2. tools-webproxy-01 on different host than tools-webproxy-02.
  3. tools-exec-* distributed "evenly" over all hosts with a maximum deviance of x %.
  4. tools-webgrid-* distributed "evenly" over all hosts with a maximum deviance of x %.
  5. Etc.

So this bit us again today. I guess we should write a small script that identifies failover instances on the same virt* hosts and then move them around.

mark raised the priority of this task from Low to High.May 16 2015, 9:41 AM
mark added a subscriber: mark.

@hashar pointed out in T84989 that OpenStack apparently offers "scheduler filters" for a similar purpose. So if we rename tools-master/tools-shadow to tools-master-01/tools-master-02 and could add a filter that, for instances with the name starting with tools- the oddness of the last digit must equal the oddness of the last digit in the virtual node's name, we would pretty much be there.

I've a WIP running script, it's identified that tools-master and tools-shadow are in the same host (labvirt1004). Will clean up said script and puppetize as an icinga check.

bd808 claimed this task.
bd808 added a subscriber: bd808.

We have a "spread monitoring" script that will alert if more than 1/3 of hosts tagged as a "cluster" are on the same cloudvirt.