While investigating T333922: toolforge k8s control plane freezing and other stability issues we noticed that the Toolforge Kubernetes control plane nodes are undersized for the task as they're using the g2.cores2.ram4.disk40 flavor. We need to migrate those instances to a larger flavor.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T333922 toolforge k8s control plane freezing and other stability issues | |||
Resolved | taavi | T333929 toolforge: Move k8s control plane nodes to larger instance flavors |
Event Timeline
I tried but looks like we can't resize those instances as the g2 flavors are now unavailable, so I'm just going to re-create them with newer flavors.
Mentioned in SAL (#wikimedia-cloud-feed) [2023-04-04T18:45:05Z] <wm-bot2> Adding a new k8s CONTROL node (T333929) - cookbook ran by taavi@runko
Mentioned in SAL (#wikimedia-cloud-feed) [2023-04-04T18:46:35Z] <wm-bot2> Adding a new k8s control node (T333929) - cookbook ran by taavi@runko
Mentioned in SAL (#wikimedia-cloud-feed) [2023-04-04T19:00:20Z] <wm-bot2> Adding a new k8s control node (T333929) - cookbook ran by taavi@runko
Added two more control nodes to the cluster, I'll wait until tomorrow before removing the old nodes and adding one extra new node.
Change 905707 had a related patch set uploaded (by Majavah; author: Majavah):
[cloud/wmcs-cookbooks@main] cookbooks: Generalize Toolforge add node cookbook to add control nodes
Change 905707 merged by jenkins-bot:
[cloud/wmcs-cookbooks@main] cookbooks: Generalize Toolforge add node cookbook to add control nodes
Mentioned in SAL (#wikimedia-cloud-feed) [2023-04-05T10:16:30Z] <wm-bot2> Adding a new k8s control node (T333929) - cookbook ran by taavi@runko
Mentioned in SAL (#wikimedia-cloud-feed) [2023-04-05T10:41:14Z] <wm-bot2> Adding a new k8s control node (T333929) - cookbook ran by taavi@runko
Mentioned in SAL (#wikimedia-cloud-feed) [2023-04-05T11:08:57Z] <wm-bot2> Adding a new k8s control node (T333929) - cookbook ran by taavi@runko
Mentioned in SAL (#wikimedia-cloud-feed) [2023-04-05T11:21:52Z] <wm-bot2> Adding a new k8s control node (T333929) - cookbook ran by taavi@runko
Mentioned in SAL (#wikimedia-cloud-feed) [2023-04-05T11:39:06Z] <wm-bot2> Adding a new k8s control node (T333929) - cookbook ran by taavi@runko
Change 905997 had a related patch set uploaded (by Majavah; author: Majavah):
[cloud/wmcs-cookbooks@main] toolforge: update firewall rules on etcd nodes when needed
Added one more node and removed -control-1. I'll again wait some time before continuing to remove the last old nodes.
Change 906022 had a related patch set uploaded (by Majavah; author: Majavah):
[cloud/wmcs-cookbooks@main] wmcs_libs: inventory: refresh tools k8s worker nodes
Change 905997 merged by jenkins-bot:
[cloud/wmcs-cookbooks@main] toolforge: update firewall rules on etcd nodes when needed
Change 906022 merged by jenkins-bot:
[cloud/wmcs-cookbooks@main] wmcs_libs: inventory: refresh tools k8s control nodes
Mentioned in SAL (#wikimedia-cloud-feed) [2023-04-07T14:34:31Z] <wm-bot2> drained, depooled and removed k8s control node tools-k8s-control-3 (T333929) - cookbook ran by taavi@runko