Page MenuHomePhabricator

toolforge: Move k8s control plane nodes to larger instance flavors
Closed, ResolvedPublic

Description

While investigating T333922: toolforge k8s control plane freezing and other stability issues we noticed that the Toolforge Kubernetes control plane nodes are undersized for the task as they're using the g2.cores2.ram4.disk40 flavor. We need to migrate those instances to a larger flavor.

Event Timeline

taavi triaged this task as High priority.Apr 4 2023, 10:25 AM
taavi created this task.
aborrero moved this task from Inbox to Soon! on the cloud-services-team board.

I tried but looks like we can't resize those instances as the g2 flavors are now unavailable, so I'm just going to re-create them with newer flavors.

Mentioned in SAL (#wikimedia-cloud-feed) [2023-04-04T18:45:05Z] <wm-bot2> Adding a new k8s CONTROL node (T333929) - cookbook ran by taavi@runko

Mentioned in SAL (#wikimedia-cloud-feed) [2023-04-04T18:46:35Z] <wm-bot2> Adding a new k8s control node (T333929) - cookbook ran by taavi@runko

Mentioned in SAL (#wikimedia-cloud-feed) [2023-04-04T19:00:20Z] <wm-bot2> Adding a new k8s control node (T333929) - cookbook ran by taavi@runko

Added two more control nodes to the cluster, I'll wait until tomorrow before removing the old nodes and adding one extra new node.

Change 905707 had a related patch set uploaded (by Majavah; author: Majavah):

[cloud/wmcs-cookbooks@main] cookbooks: Generalize Toolforge add node cookbook to add control nodes

https://gerrit.wikimedia.org/r/905707

Change 905707 merged by jenkins-bot:

[cloud/wmcs-cookbooks@main] cookbooks: Generalize Toolforge add node cookbook to add control nodes

https://gerrit.wikimedia.org/r/905707

Mentioned in SAL (#wikimedia-cloud-feed) [2023-04-05T10:16:30Z] <wm-bot2> Adding a new k8s control node (T333929) - cookbook ran by taavi@runko

Mentioned in SAL (#wikimedia-cloud-feed) [2023-04-05T10:41:14Z] <wm-bot2> Adding a new k8s control node (T333929) - cookbook ran by taavi@runko

Mentioned in SAL (#wikimedia-cloud-feed) [2023-04-05T11:08:57Z] <wm-bot2> Adding a new k8s control node (T333929) - cookbook ran by taavi@runko

Mentioned in SAL (#wikimedia-cloud-feed) [2023-04-05T11:21:52Z] <wm-bot2> Adding a new k8s control node (T333929) - cookbook ran by taavi@runko

Mentioned in SAL (#wikimedia-cloud-feed) [2023-04-05T11:39:06Z] <wm-bot2> Adding a new k8s control node (T333929) - cookbook ran by taavi@runko

Change 905997 had a related patch set uploaded (by Majavah; author: Majavah):

[cloud/wmcs-cookbooks@main] toolforge: update firewall rules on etcd nodes when needed

https://gerrit.wikimedia.org/r/905997

Added one more node and removed -control-1. I'll again wait some time before continuing to remove the last old nodes.

Change 906022 had a related patch set uploaded (by Majavah; author: Majavah):

[cloud/wmcs-cookbooks@main] wmcs_libs: inventory: refresh tools k8s worker nodes

https://gerrit.wikimedia.org/r/906022

Change 905997 merged by jenkins-bot:

[cloud/wmcs-cookbooks@main] toolforge: update firewall rules on etcd nodes when needed

https://gerrit.wikimedia.org/r/905997

Change 906022 merged by jenkins-bot:

[cloud/wmcs-cookbooks@main] wmcs_libs: inventory: refresh tools k8s control nodes

https://gerrit.wikimedia.org/r/906022

Mentioned in SAL (#wikimedia-cloud-feed) [2023-04-07T14:34:31Z] <wm-bot2> drained, depooled and removed k8s control node tools-k8s-control-3 (T333929) - cookbook ran by taavi@runko