Page MenuHomePhabricator

Toolforge: scale down grid engine nodes, scale up k8s workers (mid February 2023)
Closed, ResolvedPublic

Description

As workload is shifting from grid engine to kubernetes, we should downscale one and upscale the other.

As of this writing , a quick check at https://sge-status.toolforge.org/ shows that:

  • there are 2x exec nodes running less than 8 jobs
  • there are 3x weblight nodes running just 2 jobs

So perhaps we can just take that amount down and relocate the CPU/RAM into kubernetes.

@taavi has volunteered to do this work.

Event Timeline

taavi renamed this task from Toolforge: scale down grid engine nodes, scale up k8s workers to Toolforge: scale down grid engine nodes, scale up k8s workers (mid February 2023).Feb 10 2023, 11:22 AM
taavi claimed this task.

Mentioned in SAL (#wikimedia-cloud-feed) [2023-02-10T11:24:12Z] <wm-bot2> removing grid node tools-sgeexec-10-1.tools.eqiad1.wikimedia.cloud (T329357) - cookbook ran by taavi@runko

Mentioned in SAL (#wikimedia-cloud-feed) [2023-02-10T11:26:50Z] <wm-bot2> removing grid node tools-sgeweblight-10-12.tools.eqiad1.wikimedia.cloud (T329357) - cookbook ran by taavi@runko

Mentioned in SAL (#wikimedia-cloud-feed) [2023-02-10T11:39:36Z] <wm-bot2> removing grid node tools-sgeweblight-10-19.tools.eqiad1.wikimedia.cloud (T329357) - cookbook ran by taavi@runko

Mentioned in SAL (#wikimedia-cloud-feed) [2023-02-10T11:42:00Z] <wm-bot2> removing grid node tools-sgeexec-10-5.tools.eqiad1.wikimedia.cloud (T329357) - cookbook ran by taavi@runko

Mentioned in SAL (#wikimedia-cloud-feed) [2023-02-10T11:44:14Z] <wm-bot2> removing grid node tools-sgeweblight-10-23.tools.eqiad1.wikimedia.cloud (T329357) - cookbook ran by taavi@runko

Mentioned in SAL (#wikimedia-cloud-feed) [2023-02-10T11:53:08Z] <wm-bot2> Adding a new k8s worker node (T329357) - cookbook ran by taavi@runko

Mentioned in SAL (#wikimedia-cloud-feed) [2023-02-10T12:15:34Z] <wm-bot2> Adding a new k8s worker node (T329357) - cookbook ran by taavi@runko

Mentioned in SAL (#wikimedia-cloud-feed) [2023-02-10T12:29:05Z] <wm-bot2> Added a new k8s worker tools-k8s-worker-81.tools.eqiad1.wikimedia.cloud to the worker pool (T329357) - cookbook ran by taavi@runko

Mentioned in SAL (#wikimedia-cloud-feed) [2023-02-10T12:31:38Z] <wm-bot2> Adding a new k8s worker node (T329357) - cookbook ran by taavi@runko

Mentioned in SAL (#wikimedia-cloud-feed) [2023-02-10T12:44:11Z] <wm-bot2> Added a new k8s worker tools-k8s-worker-82.tools.eqiad1.wikimedia.cloud to the worker pool (T329357) - cookbook ran by taavi@runko

Removed that number of grid nodes and added 3x g3.cores8.ram16.disk20.ephem140 (roughly the equivalent amount of cpu/ram) k8s worker nodes.