Migrate wikikube control planes to hardware nodes
Open, HighPublic
Actions

Assigned To

Authored By

	JMeybohm
	Dec 14 2023, 4:09 PM

Description

Currently we run 2 control planes as well as 3 etcd nodes per DC as ganeti VMs. We already hit limits in terms of IOPS on the etcd instances and we do scratch on the upper "limit" for memory on ganeti for the control planes (12GB currently).

We should draft a plan to migrate from the 2+3 ganeti instances to 3 hardware nodes (repurposing mw appservers) and co-locate a kubernetes master and etcd sever on each of them.

It should be possible to do this by adding the new control-planes/etcd nodes and remove the ganeti ones after.

In the spreadsheet at T351074: Move servers from the appserver/api cluster to kubernetes I've reserved 3 R440 nodes per DC to be used as apiservers:

mw2391
mw2331
mw2361

mw1372
mw1492
mw1436

These should be renamed during reimage because of their special role in the cluster:

I wrote a documentation on how to add stacked control-planes and how to remove them as well as etcd nodes at: https://wikitech.wikimedia.org/wiki/Kubernetes/Clusters/Add_or_remove_control-planes

For preparation we should reimage the above appservers to insetup using the same partition layout as we use for kubernetes workers.

What I totally failed to think about while doing staging is the opportunity to align wikikube control-plane names with the other clusters which use names like ml-serve-ctrlXXXX/aux-k8s-ctrlXXXX. So maybe we could rename to wikikube-ctrlXXX (I really don't like the k8s that dse and aux threw in the mix) to come one step closer to T336861: Fix naming confusion around main/wikikube kubernetes clusters.

Details

	Subject	Repo	Branch	Lines +/-
	appservers: 6 appservers to insetup before reimaging	operations/puppet	production	+30 -12

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	JMeybohm	T353233 Outage of wikikube codfw apiservers
Open	None	T341984 Update Kubernetes clusters to >1.25
Open	JMeybohm	T353464 Migrate wikikube control planes to hardware nodes
Duplicate	Clement_Goubert	T348466 Rethink kubernetes etcd storage
Resolved	JMeybohm	T363307 Co-locate kube-apiserver and etcd on new staging control plane nodes
Resolved	JMeybohm	T363310 Site: codfw 1 VM request for staging-codfw kube-apiserver
Resolved	JMeybohm	T364740 Site: codfw 2 VM request for staging-codfw kube-apiserver
Resolved	JMeybohm	T364746 Site: eqiad 3 VM request for staging-eqiad kube-apiserver

Event Timeline

JMeybohm triaged this task as Medium priority.Dec 14 2023, 4:09 PM

JMeybohm created this task.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 14 2023, 4:09 PM

JMeybohm renamed this task from Migtate wikikube control planes to hardware nodes to Migrate wikikube control planes to hardware nodes.Dec 14 2023, 4:11 PM

JMeybohm added a parent task: T353233: Outage of wikikube codfw apiservers.Dec 15 2023, 2:59 PM

I am not so sure we actually do scratch that memory limit now. Looking at kubemaster2001 last week

and the other 3 kubernetes masters

So, at the very least we got ~50% of the VMs memory capacity before hitting problems again.

Upper memory that we can handle nicely in Ganeti is by experience ~16GB btw. But this is mostly because most applications after than number tend to both consume a lot of memory AND alter the contents of the memory faster than the migration algorithm can catch up with ending with either stuck or at least very long running migrations.

CPU usage also fell to ~20% now so we got some room to spare and think how and when we want to tackle this.

In T353464#9412761, @akosiaris wrote:

I am not so sure we actually do scratch that memory limit now. Looking at kubemaster2001 last week

I wasn't saying we do. There are still quite a number of nodes to come, even with those I won't suspect us hitting 16 or 12GB. But with the IOPS bottlenecks we saw with etcd on ganeti we probably need to move etcd servers to hardware and in that case it does not make sense to not move control-planes as well IMHO.

bking subscribed.Dec 18 2023, 3:15 PM

Forgive me for the drive-by comment, but would it be possible to create high IOPS tiers for Ganeti (RAID-0?) I'd recommend deploying in conjunction with non-DRDB VMs for services that have their own HA (such as Kubernetes control plane). I bring it up as I feel like Ganeti is an underused resource, and using it helps to avoid some of the management overhead associated with physical machines.

elukey subscribed.Dec 18 2023, 3:32 PM

Clement_Goubert added a subtask: T348466: Rethink kubernetes etcd storage.Dec 19 2023, 11:08 AM

In T353464#9413167, @bking wrote:

Forgive me for the drive-by comment, but would it be possible to create high IOPS tiers for Ganeti (RAID-0?) I'd recommend deploying in conjunction with non-DRDB VMs for services that have their own HA (such as Kubernetes control plane). I bring it up as I feel like Ganeti is an underused resource, and using it helps to avoid some of the management overhead associated with physical machines.

Yes, probably. But there would still be overhead and potential noisy neighbors on ganeti. With etcd being very sensitive in terms of IOPS this might still not give us the desired performance. Also we would need to build such a ganeti system (multiple to cover our redundancy needs).

In T353464#9412778, @JMeybohm wrote:

In T353464#9412761, @akosiaris wrote:

I am not so sure we actually do scratch that memory limit now. Looking at kubemaster2001 last week

I wasn't saying we do. There are still quite a number of nodes to come, even with those I won't suspect us hitting 16 or 12GB.

Agreed. That's my current theory as well.

But with the IOPS bottlenecks we saw with etcd on ganeti we probably need to move etcd servers to hardware and in that case it does not make sense to not move control-planes as well IMHO.

Oh, so collocate etcd with the rest of the control-plane. It will work ofc, and may even make our puppetization simpler. Just not super sold on it yet.

Do we track the IOPS bottlenecks we witnessed in some task?

Do we track the IOPS bottlenecks we witnessed in some task?

I'm also curious about the IOPS issues, since I assume the majority of etcd instances out in the wild are running on VMs and shared hardware. Might not worth be the effort for this particular project, but I'm game to help build a higher IOPS tier for Ganeti if anyone else thinks that would be helpful.

In T353464#9415585, @akosiaris wrote:

Do we track the IOPS bottlenecks we witnessed in some task?

Track no, but what triggered creating T348466: Rethink kubernetes etcd storage was investigating T348228: KubernetesAPILatency alert fires on scap deploy.

In T353464#9431952, @Clement_Goubert wrote:

In T353464#9415585, @akosiaris wrote:

Do we track the IOPS bottlenecks we witnessed in some task?

Track no, but what triggered creating T348466: Rethink kubernetes etcd storage was investigating T348228: KubernetesAPILatency alert fires on scap deploy.

Thanks, that's the context I was missing.

JMeybohm added a parent task: T341984: Update Kubernetes clusters to >1.25.Mar 5 2024, 11:38 AM

JMeybohm mentioned this in T358936: Kubernetes apiserver probe failures on restart.Mar 5 2024, 11:40 AM

JMeybohm mentioned this in T348466: Rethink kubernetes etcd storage.Mar 5 2024, 11:43 AM

Clement_Goubert merged a task: T348466: Rethink kubernetes etcd storage.Mar 5 2024, 12:44 PM

Clement_Goubert added subscribers: Joe, fgiunchedi.

MoritzMuehlenhoff subscribed.Mar 5 2024, 1:12 PM

JMeybohm raised the priority of this task from Medium to High.Apr 16 2024, 11:00 AM

JMeybohm mentioned this in T287491: Allow to address Kubernetes API servers from NetworkPolicy.

JMeybohm claimed this task.Apr 16 2024, 11:04 AM

JMeybohm updated the task description. (Show Details)Apr 16 2024, 11:17 AM

JMeybohm updated the task description. (Show Details)Apr 16 2024, 1:25 PM

JMeybohm updated the task description. (Show Details)Apr 16 2024, 1:28 PM

Scott_French subscribed.Apr 16 2024, 4:43 PM

I ran a couple of very basic benchmarks (commands in the attached filed) against single node etcd instances running on:

A mediawiki application server, ex4 on LVM on RAID1

etcd-benchmark-output-mw2391.txt5 KBDownload

A (empty) ganeti node, ext4 on LVM on RAID5

etcd-benchmark-output-ganeti-test2003.txt5 KBDownload

A ganeti instance running as the only instance on the above node, ext4 on LVM on RAID5

etcd-benchmark-output-kubestagemaster2003_isolated.txt5 KBDownload

A ganeti instance running in the prod ganeti cluster (together with other instances), ext4 on LVM on RAID5

etcd-benchmark-output-kubestagemaster2003.txt5 KBDownload

tl;dr is:
Puts with 3 clients, 100 connections (roughly k8s prod) show Requests/sec: 3963.8976 vs. 6750.9583 vs. 11895.5644 (isolated vm, ganeti node, appserver) and p99.9 latency: 0.1026s vs. 0.0599s vs. 0.0399s

hnowlan updated the task description. (Show Details)Fri, May 3, 11:32 AM

hnowlan updated the task description. (Show Details)

JMeybohm closed subtask T363307: Co-locate kube-apiserver and etcd on new staging control plane nodes as Resolved.Fri, May 17, 2:20 PM

JMeybohm updated the task description. (Show Details)Fri, May 17, 3:10 PM

CDanis subscribed.Fri, May 17, 3:13 PM

hnowlan updated the task description. (Show Details)Fri, May 17, 3:20 PM

Change #1032805 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] appservers: 6 appservers to insetup before reimaging

https://gerrit.wikimedia.org/r/1032805

gerritbot added a project: Patch-For-Review.Fri, May 17, 3:42 PM

	F48799280: etcd-benchmark-output-kubestagemaster2003.txt
	Fri, Apr 26, 2:58 PM

	F48799281: etcd-benchmark-output-kubestagemaster2003_isolated.txt
	Fri, Apr 26, 2:58 PM

	F48799282: etcd-benchmark-output-ganeti-test2003.txt
	Fri, Apr 26, 2:58 PM

	F48799283: etcd-benchmark-output-mw2391.txt
	Fri, Apr 26, 2:58 PM

	F41610964: image.png
	Dec 18 2023, 1:37 PM

	F41610962: image.png
	Dec 18 2023, 1:37 PM

	F41610960: image.png
	Dec 18 2023, 1:37 PM

	F41610958: image.png
	Dec 18 2023, 1:37 PM

Migrate wikikube control planes to hardware nodesOpen, HighPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Migrate wikikube control planes to hardware nodes
Open, HighPublic
Actions

Related Objects
Search...