Page MenuHomePhabricator

Move 6 instances for general-k8s project each to a different physical host
Closed, DeclinedPublic

Description

I have created 6 VMs in the general-k8s project: https://tools.wmflabs.org/openstack-browser/project/general-k8s

I understand this is possible for admins to assign VMS to different underlying hosts.
I can't tell what the current state is / which instances are where but I would like each instance to be on a seperate physical machine to provide resilience to the future cluster through physical machine restarts.

Event Timeline

bd808 renamed this task from general-k8s instance creation on different hypervisors to Create 6 instances for general-k8s project each on a different physical host.Jul 14 2018, 4:32 PM
bd808 moved this task from Inbox to Clinic Duty on the cloud-services-team (Kanban) board.
Addshore renamed this task from Create 6 instances for general-k8s project each on a different physical host to Move 6 instances for general-k8s project each to a different physical host.Jul 16 2018, 4:41 PM
Addshore updated the task description. (Show Details)

@chasemp is this something that you could handle or teach me how to do?

It is indeed easier to distribute them at the time of creation but yeah that's a root function. I am not sure if we have 6 virts that have headroom. What size instances and would 3 virts with 2 each work?

Oh I see they are created :). Could we start from scratch/are they configured?

Oh I see they are created :). Could we start from scratch/are they configured?

Feel free to destroy them, I'll check here before I do anything with them.

It is indeed easier to distribute them at the time of creation but yeah that's a root function. I am not sure if we have 6 virts that have headroom. What size instances and would 3 virts with 2 each work?

3 virts would be fine for now :)
We can always re assess the situation in the future.

I did a survey of instance count and load on existing hypervisors in the scheduler, and then took stock of how the existing spread was here for these nodes and it already looked fairly good. In the end I created the 6th node on a distinct hypervisor and the current situation is:

OS-EXT-SRV-ATTR:hostnamek8s-node-01
OS-EXT-SRV-ATTR:hypervisor_hostnamelabvirt1005.eqiad.wmnet
OS-EXT-SRV-ATTR:hostnamek8s-node-02
OS-EXT-SRV-ATTR:hypervisor_hostnamelabvirt1017.eqiad.wmnet
OS-EXT-SRV-ATTR:hostnamek8s-node-03
OS-EXT-SRV-ATTR:hypervisor_hostnamelabvirt1016.eqiad.wmnet
OS-EXT-SRV-ATTR:hostnamek8s-node-04
OS-EXT-SRV-ATTR:hypervisor_hostnamelabvirt1016.eqiad.wmnet
OS-EXT-SRV-ATTR:hostnamek8s-node-05
OS-EXT-SRV-ATTR:hypervisor_hostnamelabvirt1017.eqiad.wmnet

labvirt1011.eqiad.wmnet

OS_TENANT_NAME=general-k8s openstack server create --flavor 3 --image b7274e93-30f4-4567-88aa-46223c59107e --availability-zone host:labvirt1011 k8s-node-06

OS-EXT-SRV-ATTR:hostnamek8s-node-06
OS-EXT-SRV-ATTR:hypervisor_hostnamelabvirt1011.eqiad.wmnet
Addshore assigned this task to chasemp.

Thanks!

Going to re open this as since the move to eqiad-r1 all 6 hosts seem to be on just 2 hypervisors.

Would it be possible to get more of a split here?

Feel free to take the machines down at any point to move them.

Would it be possible to get more of a split here?

This re-balancing will probably need to wait for more physical hosts to be available in the eqiad1-r region. We have hardware on order now for that and will also be continuing to move physical hosts from the old region to the new region as they are emptied of instances.

bd808 changed the task status from Open to Stalled.Nov 19 2018, 5:53 PM
bd808 triaged this task as Medium priority.

Marking as stalled for now due to the lack of available cloudvirt hosts in eqiad1-r. We should revisit this following deployment of additional hardware.

Andrew changed the task status from Stalled to Open.Nov 29 2018, 9:04 PM

We have six hosts in eqiad1-r now, so I can do this anytime. Is there any reason for me to worry about downtime while the VMs move, or can I do this whenever? (And, related, is it important that I move them in sequence or can I do several at a time?)

Andrew changed the task status from Open to Stalled.Dec 11 2018, 11:02 PM
Andrew removed Andrew as the assignee of this task.
Andrew subscribed.

The current instances are split between cloudvirt1018 and cloudvirt1021. I'm kind of wondering if this project is actually seeing any use at all however. @Addshore are these instances just idling and taking up CPU/RAM/disk resources?

If I may digress from the main request here, I was reading T196094 and it's not clear to me why we want to encourage users to target yet another k8s cluster beside the one in Toolforge.

If this is just a testbed for working on Puppet automation for a general-purpose k8s cluster, may I ask where we can find that? It would be interesting to have Toolforge k8s and this effort be closely aligned, as I'm expecting WMCS will be called to support it at some point.

Finally, we currently can't easily migrate VMs around due to lack of shared storage. It seems we have a paradox: if the general-k8s project needs to ensure its VMs are running on separate hypervisors, it's a production-like service, right? But that can't be because it's for testing the deployment of a general k8s cluster? :-)

If this is a nice-to-have request, I totally understand it. We're just a little bit time-constrained to keep work on this at the moment (and ensure it stays like that easily in the future).

The current instances are split between cloudvirt1018 and cloudvirt1021. I'm kind of wondering if this project is actually seeing any use at all however. @Addshore are these instances just idling and taking up CPU/RAM/disk resources?

The instances are seeing spurtes of use, I'm trying to turn them off / just delete them in between these spurts now (no instances currently are running in the project after my last round of testing).

If this is just a testbed for working on Puppet automation for a general-purpose k8s cluster, may I ask where we can find that? It would be interesting to have Toolforge k8s and this effort be closely aligned, as I'm expecting WMCS will be called to support it at some point.

If the idea for tools k8s is that it would be more flexible for what could be run there, that would be great and then yes lets nuke this project.
But right now from the other discussions we had in this area the tools k8s cluster is for a more locked down purpose?

Finally, we currently can't easily migrate VMs around due to lack of shared storage. It seems we have a paradox: if the general-k8s project needs to ensure its VMs are running on separate hypervisors, it's a production-like service, right? But that can't be because it's for testing the deployment of a general k8s cluster? :-)

When i initially requested them to be on different hypervisors i merely asked as it makes sense for a k8s cluster that is meant to be able to continue working when nodes go down not to have all nodes go down when a hypervisor breaks.
Amusingly since then some hypervisors broke and some nodes broke and we ended up just nuking them all.

If this is a nice-to-have request, I totally understand it. We're just a little bit time-constrained to keep work on this at the moment (and ensure it stays like that easily in the future).

Now when doing tests of cluster setup etc I don't really mind where these nodes end up as I am frequently nuking the whole thing again. so I think we can close this request now