Page MenuHomePhabricator

wmcs codfw hardware changes proposal
Open, MediumPublic

Description

This ticket contains a proposal to reshuffle some of the hardware servers the WMCS team has in the CODFW datacenter.

Hardware references

Facts

Proposal

  • cloudcontrol2006-dev: increase memory in-place, or replace with another server with higher memory
  • cloudcontrol2007-dev: use as cloudservices2006-dev, as node nº3 in the HA setup
  • cloudcontrol2008-dev: dedicate to cloudrabbit in codfw1dev, see T377934
  • cloudcontrol2009-dev: dedicate to cloudrabbit in codfw1dev, see T377934
  • cloudnet2007-dev: use for codfw1dev as node nº3 in the HA setup
  • cloudnet2008-dev: use as cloudgw2004-dev, as node nº3 in the HA setup

Event Timeline

aborrero triaged this task as Medium priority.Oct 18 2024, 12:04 PM
aborrero updated the task description. (Show Details)
aborrero moved this task from Backlog to Radar/observer on the User-aborrero board.
aborrero added subscribers: joanna_borun, RobH.

please @joanna_borun and @RobH review the proposal in the ticket description.

I just noticed T377570: WMCS hardware services: 3-node HA redundancy model and we can use some of the hardware for that:

  • one additional cloudgw
  • one additional cloudnet
  • one additional cloudservices

I'll update the ticket description with this

This comment was removed by RobH.

Answered inline, but 5 hosts all sat unused is a bit alarming to me, as each server all been using budget/power/space since purchase without being leveraged.

cloudcontrol2006-dev: increase memory in-place, or replace with another server with higher memory

This host was purchsed on 2023-07-24, and has quite a bit of support term remaining. Adding memory requires expendables budget and thus would need to have the WMCS manager and then Willy's approval as this is an unbudgeted request. Alternatively, this could be budgeted into next fiscal year. The host currently has 32B, would you want this increased to 64GB or 128GB? Once I know, I can get it quoted and we can determine if this can occur on this fiscal year (unbudgeted) or need to be pushed. We also may be able to snag from eqiad decom pile and ship to codfw, checking but need to know how much total memory is needed for this host?

There is only the 'test host' in codfw to replace with, no other standard configs. The test host is overkill, so it would likely be better to know your memory requirement before making a determination.

cloudcontrol2007-dev: use as cloudservices2006-dev, as node nº3 in the HA setup

DC Ops really doesn't care how WMCS leverages their budgeted hardware, so not sure what I'd be weighing in on here about.

cloudcontrol2008-dev: give back to spares

This was just purchased on 2023-07-24, does WMC have no use? We no longer keep 'spares' as they tend to just age out of support without being used. Does WMCS have no use for this previously budgeted purchase?

cloudcontrol2009-dev: give back to spares

This was just purchased on 2024-04-04, does WMC have no use? We no longer keep 'spares' as they tend to just age out of support without being used. Does WMCS have no use for this previously budgeted purchase?

cloudnet2007-dev: use for codfw1dev as node nº3 in the HA setup

DC Ops really doesn't care how WMCS leverages their budgeted hardware, so not sure what I'd be weighing in on here about.

cloudnet2008-dev: use as cloudgw2004-dev, as node nº3 in the HA setup

DC Ops really doesn't care how WMCS leverages their budgeted hardware, so not sure what I'd be weighing in on here about.

cloudcontrol2006-dev: increase memory in-place, or replace with another server with higher memory

This host was purchsed on 2023-07-24, and has quite a bit of support term remaining. Adding memory requires expendables budget and thus would need to have the WMCS manager and then Willy's approval as this is an unbudgeted request. Alternatively, this could be budgeted into next fiscal year. The host currently has 32B, would you want this increased to 64GB or 128GB? Once I know, I can get it quoted and we can determine if this can occur on this fiscal year (unbudgeted) or need to be pushed. We also may be able to snag from eqiad decom pile and ship to codfw, checking but need to know how much total memory is needed for this host?

There is only the 'test host' in codfw to replace with, no other standard configs. The test host is overkill, so it would likely be better to know your memory requirement before making a determination.

I just checked with Valerie @ eqiad and she is sending some spare 3200 speed 32GB dimms for some other upgrades in codfw, so she will also include 3*32GB for this, which would bring this host from 32 to 128GB (and if it only has to go to 64 we'll have spare memory in codfw.). So that should avoid the budget spend for this!

I just checked with Valerie @ eqiad and she is sending some spare 3200 speed 32GB dimms for some other upgrades in codfw, so she will also include 3*32GB for this, which would bring this host from 32 to 128GB (and if it only has to go to 64 we'll have spare memory in codfw.). So that should avoid the budget spend for this!

Great! 128GB for this server would be excellent.

I will use T370401 for the DCops folks in codfw to track this.

cloudcontrol2008-dev: give back to spares

This was just purchased on 2023-07-24, does WMC have no use? We no longer keep 'spares' as they tend to just age out of support without being used. Does WMCS have no use for this previously budgeted purchase?

cloudcontrol2009-dev: give back to spares

This was just purchased on 2024-04-04, does WMC have no use? We no longer keep 'spares' as they tend to just age out of support without being used. Does WMCS have no use for this previously budgeted purchase?

The project the servers were originally part of was cancelled, see https://phabricator.wikimedia.org/T342750#10241636

If you don't recommend the 'spares' route, then my next suggestion would be to mirror the rabbitmq setup we have in eqiad using these servers.

This is: repurposing these 2 as 'cloudrabbit' hosts. I'll update the task description

I think re-purposing these for rabbitmq is better than 'spares' which tend to age out and never get used. @wiki_willy: Would you agree with this?

Basically WMCS has 2 hosts they can return to spares due to a canceled project but we don't really keep spares, and they could repurpose instead as rabbitmq similar to what they run in eqiad. Since this involves hardware allocations, I wanted to double check with you. My advise is to re-purpose to rabbitmq since that is an immediate use of the hosts, versus sitting spare and not being used.

Yup, agreed. If the servers can be reallocated for something else that is currently needed, I think it makes more sense to just repurpose them vs keeping them as spares or decommissioning them.