Maniphest T199532

Move 6 instances for general-k8s project each to a different physical host
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	Addshore
	Jul 13 2018, 12:20 PM

Description

I have created 6 VMs in the general-k8s project: https://tools.wmflabs.org/openstack-browser/project/general-k8s

I understand this is possible for admins to assign VMS to different underlying hosts.
I can't tell what the current state is / which instances are where but I would like each instance to be on a seperate physical machine to provide resilience to the future cluster through physical machine restarts.

Related Objects

Mentioned Here: T196094: Request creation of general-k8s VPS project

Event Timeline

Addshore created this task.Jul 13 2018, 12:20 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 13 2018, 12:20 PM

bd808 renamed this task from general-k8s instance creation on different hypervisors to Create 6 instances for general-k8s project each on a different physical host.Jul 14 2018, 4:32 PM

bd808 added a project: cloud-services-team (Kanban).

bd808 moved this task from Inbox to Clinic Duty on the cloud-services-team (Kanban) board.

Addshore renamed this task from Create 6 instances for general-k8s project each on a different physical host to Move 6 instances for general-k8s project each to a different physical host.Jul 16 2018, 4:41 PM

Addshore updated the task description. (Show Details)

@chasemp is this something that you could handle or teach me how to do?

It is indeed easier to distribute them at the time of creation but yeah that's a root function. I am not sure if we have 6 virts that have headroom. What size instances and would 3 virts with 2 each work?

Oh I see they are created :). Could we start from scratch/are they configured?

In T199532#4433930, @chasemp wrote:

Oh I see they are created :). Could we start from scratch/are they configured?

Feel free to destroy them, I'll check here before I do anything with them.

In T199532#4433911, @chasemp wrote:

It is indeed easier to distribute them at the time of creation but yeah that's a root function. I am not sure if we have 6 virts that have headroom. What size instances and would 3 virts with 2 each work?

3 virts would be fine for now :)
We can always re assess the situation in the future.

I did a survey of instance count and load on existing hypervisors in the scheduler, and then took stock of how the existing spread was here for these nodes and it already looked fairly good. In the end I created the 6th node on a distinct hypervisor and the current situation is:

OS-EXT-SRV-ATTR:hostname	k8s-node-01
OS-EXT-SRV-ATTR:hypervisor_hostname	labvirt1005.eqiad.wmnet

OS-EXT-SRV-ATTR:hostname	k8s-node-02
OS-EXT-SRV-ATTR:hypervisor_hostname	labvirt1017.eqiad.wmnet

OS-EXT-SRV-ATTR:hostname	k8s-node-03
OS-EXT-SRV-ATTR:hypervisor_hostname	labvirt1016.eqiad.wmnet

OS-EXT-SRV-ATTR:hostname	k8s-node-04
OS-EXT-SRV-ATTR:hypervisor_hostname	labvirt1016.eqiad.wmnet

OS-EXT-SRV-ATTR:hostname	k8s-node-05
OS-EXT-SRV-ATTR:hypervisor_hostname	labvirt1017.eqiad.wmnet

labvirt1011.eqiad.wmnet

OS_TENANT_NAME=general-k8s openstack server create --flavor 3 --image b7274e93-30f4-4567-88aa-46223c59107e --availability-zone host:labvirt1011 k8s-node-06

OS-EXT-SRV-ATTR:hostname	k8s-node-06
OS-EXT-SRV-ATTR:hypervisor_hostname	labvirt1011.eqiad.wmnet

Thanks!

Going to re open this as since the move to eqiad-r1 all 6 hosts seem to be on just 2 hypervisors.

Would it be possible to get more of a split here?

Feel free to take the machines down at any point to move them.

In T199532#4755487, @Addshore wrote:

Would it be possible to get more of a split here?

This re-balancing will probably need to wait for more physical hosts to be available in the eqiad1-r region. We have hardware on order now for that and will also be continuing to move physical hosts from the old region to the new region as they are emptied of instances.

• chasemp removed • chasemp as the assignee of this task.Nov 19 2018, 5:49 PM

Marking as stalled for now due to the lack of available cloudvirt hosts in eqiad1-r. We should revisit this following deployment of additional hardware.

Andrew claimed this task.Nov 20 2018, 4:47 PM

We have six hosts in eqiad1-r now, so I can do this anytime. Is there any reason for me to worry about downtime while the VMs move, or can I do this whenever? (And, related, is it important that I move them in sequence or can I do several at a time?)

...is anybody there?

Andrew changed the task status from Open to Stalled.Dec 11 2018, 11:02 PM

Andrew removed Andrew as the assignee of this task.

Andrew subscribed.

The current instances are split between cloudvirt1018 and cloudvirt1021. I'm kind of wondering if this project is actually seeing any use at all however. @Addshore are these instances just idling and taking up CPU/RAM/disk resources?

If I may digress from the main request here, I was reading T196094 and it's not clear to me why we want to encourage users to target yet another k8s cluster beside the one in Toolforge.

If this is just a testbed for working on Puppet automation for a general-purpose k8s cluster, may I ask where we can find that? It would be interesting to have Toolforge k8s and this effort be closely aligned, as I'm expecting WMCS will be called to support it at some point.

Finally, we currently can't easily migrate VMs around due to lack of shared storage. It seems we have a paradox: if the general-k8s project needs to ensure its VMs are running on separate hypervisors, it's a production-like service, right? But that can't be because it's for testing the deployment of a general k8s cluster? :-)

If this is a nice-to-have request, I totally understand it. We're just a little bit time-constrained to keep work on this at the moment (and ensure it stays like that easily in the future).

• GTirloni unsubscribed.Mar 21 2019, 9:11 PM

In T199532#4942061, @bd808 wrote:

The current instances are split between cloudvirt1018 and cloudvirt1021. I'm kind of wondering if this project is actually seeing any use at all however. @Addshore are these instances just idling and taking up CPU/RAM/disk resources?

The instances are seeing spurtes of use, I'm trying to turn them off / just delete them in between these spurts now (no instances currently are running in the project after my last round of testing).

If this is just a testbed for working on Puppet automation for a general-purpose k8s cluster, may I ask where we can find that? It would be interesting to have Toolforge k8s and this effort be closely aligned, as I'm expecting WMCS will be called to support it at some point.

If the idea for tools k8s is that it would be more flexible for what could be run there, that would be great and then yes lets nuke this project.
But right now from the other discussions we had in this area the tools k8s cluster is for a more locked down purpose?

Finally, we currently can't easily migrate VMs around due to lack of shared storage. It seems we have a paradox: if the general-k8s project needs to ensure its VMs are running on separate hypervisors, it's a production-like service, right? But that can't be because it's for testing the deployment of a general k8s cluster? :-)

When i initially requested them to be on different hypervisors i merely asked as it makes sense for a k8s cluster that is meant to be able to continue working when nodes go down not to have all nodes go down when a hypervisor breaks.
Amusingly since then some hypervisors broke and some nodes broke and we ended up just nuking them all.

If this is a nice-to-have request, I totally understand it. We're just a little bit time-constrained to keep work on this at the moment (and ensure it stays like that easily in the future).

Now when doing tests of cluster setup etc I don't really mind where these nodes end up as I am frequently nuking the whole thing again. so I think we can close this request now

Move 6 instances for general-k8s project each to a different physical hostClosed, DeclinedPublicActions

Description

Related Objects

Event Timeline

Move 6 instances for general-k8s project each to a different physical host
Closed, DeclinedPublic
Actions