Page MenuHomePhabricator

Server Lifecycle: re-arrage statuses including a decommissioning one
Closed, ResolvedPublic

Description

With the upgrade to the latest Netbox release as part of T222351, there will be a new decommissioning status that can be used.

  • Find an agreement in accordance with DC-Ops for the introduction of the new decommissioning status
  • bulk update the status of existing devices in Netbox to the new statuses
  • The https://wikitech.wikimedia.org/wiki/Server_Lifecycle page needs to be refactored to include the use of this status.

Event Timeline

Volans renamed this task from Server Lifecycle: re-arrage statuses including a decommissioning one to Server Lifecycle: re-arrage statuses including a decommissioning one.May 2 2019, 10:22 AM
Volans updated the task description. (Show Details)

Here a draft proposal for the introduction of the new decomissioning state in Netbox and our Lifecycle.

Current definition:

Server LifecycleNetboxRackedPower
requestednone, not yet in Netboxnon/a
sparePLANNEDyes or nooff
stagedSTAGEDyeson
activeACTIVEyeson
failedFAILEDyeson or off
decommissionedINVENTORYyeson or off
unrackedOFFLINEnon/a
recyclednone, not anymore in Netboxnon/a

Proposed changes (diffs are highlighted):

Server LifecycleNetboxRackedPowerComments
requestednone, not yet in Netboxnon/a
spareINVENTORYyesoffPermanent state for spare hosts
plannedPLANNEDyes or nooffTemporary state for new hosts during the commissioning process
stagedSTAGEDyeson
activeACTIVEyeson
failedFAILEDyeson or off
decommissionedDECOMMISSIONINGyeson or offSame behaviour, just different state name
unrackedOFFLINEnon/a
recyclednone, not anymore in Netboxnon/a

Proposed new transition diagram:

netbox.png (731×340 px, 50 KB)

It's still not fully clear to me where an online host with the spare::system Puppet role should live in this picture, but my best bet is that we should go towards not using that Puppet role at all IMHO.

@RobH, @Cmjohnson, @Papaul, @wiki_willy, @faidon thoughts?

Here a draft proposal for the introduction of the new decomissioning state in Netbox and our Lifecycle.

Current definition:

Server LifecycleNetboxRackedPower
requestednone, not yet in Netboxnon/a
sparePLANNEDyes or nooff
stagedSTAGEDyeson
activeACTIVEyeson
failedFAILEDyeson or off
decommissionedINVENTORYyeson or off
unrackedOFFLINEnon/a
recyclednone, not anymore in Netboxnon/a

Proposed changes (diffs are highlighted):

Server LifecycleNetboxRackedPowerComments
requestednone, not yet in Netboxnon/a
spareINVENTORYyesoffPermanent state for spare hosts
plannedPLANNEDyes or nooffTemporary state for new hosts during the commissioning process
stagedSTAGEDyeson
activeACTIVEyeson
failedFAILEDyeson or off
decommissionedDECOMMISSIONINGyeson or offSame behaviour, just different state name
unrackedOFFLINEnon/a
recyclednone, not anymore in Netboxnon/a

Proposed new transition diagram:

netbox.png (731×340 px, 50 KB)

It's still not fully clear to me where an online host with the spare::system Puppet role should live in this picture, but my best bet is that we should go towards not using that Puppet role at all IMHO.

@RobH, @Cmjohnson, @Papaul, @wiki_willy, @faidon thoughts?

Please note @Volans had previously reviewed this with me in IRC. This looks great to me, and I'm looking forward to the status results being a bit more logical-seeming in their naming =]

I think this is an excellent change.

It's still not fully clear to me where an online host with the spare::system Puppet role should live in this picture, but my best bet is that we should go towards not using that Puppet role at all IMHO.

@RobH, @Cmjohnson, @Papaul, @wiki_willy, @faidon thoughts?

We need to split this role into a few different names that do basically the same thing but also match status:

role::staged (basically role spare but for systems awaiting their proper role and push into service)
role::decommissioning (role spare but service owners push their server into this when they want dc ops to decommission the hardware)

planned state doesn't need a puppet role, since that is pre-OS installation. That should clarify the role::spare issue, right?

Change 508671 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] splitting role::spare into staged and decomisssioning

https://gerrit.wikimedia.org/r/508671

Mentioned in SAL (#wikimedia-operations) [2019-05-09T11:49:04Z] <volans> updated netbox statues for decommissioning and spare hosts according to T222352

I've updated the Server Lifecycle page, see the diff at:
https://wikitech.wikimedia.org/w/index.php?title=Server_Lifecycle&type=revision&diff=1825673&oldid=1823280
It might need some tweaking and fine-tuning, I've tried to keep the diff at a minimum.

I've updated Netbox devices, here the new lists:

Please review the list of Spare vs Planned to make sure I've applied them correctly.

Earlier today I've also sent an email to the Ops list with all the details.

Resolving for now, feel free to re-open if it needs any follow up work.

Change 508671 abandoned by RobH:
splitting role::spare into staged and decomisssioning

Reason:
Agreed!

https://gerrit.wikimedia.org/r/c/operations/puppet/ /508671