Page MenuHomePhabricator

Server Lifecycle: re-arrage statuses including a decommissioning one
Closed, ResolvedPublic

Description

With the upgrade to the latest Netbox release as part of T222351, there will be a new decommissioning status that can be used.

  • Find an agreement in accordance with DC-Ops for the introduction of the new decommissioning status
  • bulk update the status of existing devices in Netbox to the new statuses
  • The https://wikitech.wikimedia.org/wiki/Server_Lifecycle page needs to be refactored to include the use of this status.

Event Timeline

Volans created this task.May 2 2019, 10:16 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 2 2019, 10:16 AM
Volans renamed this task from Server Lifecycle: re-arrage statuses including a decommissioning one to Server Lifecycle: re-arrage statuses including a decommissioning one.May 2 2019, 10:22 AM
Volans updated the task description. (Show Details)

Here a draft proposal for the introduction of the new decomissioning state in Netbox and our Lifecycle.

Current definition:

Server LifecycleNetboxRackedPower
requestednone, not yet in Netboxnon/a
sparePLANNEDyes or nooff
stagedSTAGEDyeson
activeACTIVEyeson
failedFAILEDyeson or off
decommissionedINVENTORYyeson or off
unrackedOFFLINEnon/a
recyclednone, not anymore in Netboxnon/a

Proposed changes (diffs are highlighted):

Server LifecycleNetboxRackedPowerComments
requestednone, not yet in Netboxnon/a
spareINVENTORYyesoffPermanent state for spare hosts
plannedPLANNEDyes or nooffTemporary state for new hosts during the commissioning process
stagedSTAGEDyeson
activeACTIVEyeson
failedFAILEDyeson or off
decommissionedDECOMMISSIONINGyeson or offSame behaviour, just different state name
unrackedOFFLINEnon/a
recyclednone, not anymore in Netboxnon/a

Proposed new transition diagram:

It's still not fully clear to me where an online host with the spare::system Puppet role should live in this picture, but my best bet is that we should go towards not using that Puppet role at all IMHO.

@RobH, @Cmjohnson, @Papaul, @wiki_willy, @faidon thoughts?

crusnov added a subscriber: crusnov.May 2 2019, 8:17 PM
RobH added a comment.May 7 2019, 5:11 PM

Here a draft proposal for the introduction of the new decomissioning state in Netbox and our Lifecycle.

Current definition:

Server LifecycleNetboxRackedPower
requestednone, not yet in Netboxnon/a
sparePLANNEDyes or nooff
stagedSTAGEDyeson
activeACTIVEyeson
failedFAILEDyeson or off
decommissionedINVENTORYyeson or off
unrackedOFFLINEnon/a
recyclednone, not anymore in Netboxnon/a

Proposed changes (diffs are highlighted):

Server LifecycleNetboxRackedPowerComments
requestednone, not yet in Netboxnon/a
spareINVENTORYyesoffPermanent state for spare hosts
plannedPLANNEDyes or nooffTemporary state for new hosts during the commissioning process
stagedSTAGEDyeson
activeACTIVEyeson
failedFAILEDyeson or off
decommissionedDECOMMISSIONINGyeson or offSame behaviour, just different state name
unrackedOFFLINEnon/a
recyclednone, not anymore in Netboxnon/a

Proposed new transition diagram:

It's still not fully clear to me where an online host with the spare::system Puppet role should live in this picture, but my best bet is that we should go towards not using that Puppet role at all IMHO.

@RobH, @Cmjohnson, @Papaul, @wiki_willy, @faidon thoughts?

Please note @Volans had previously reviewed this with me in IRC. This looks great to me, and I'm looking forward to the status results being a bit more logical-seeming in their naming =]

I think this is an excellent change.

RobH added a comment.May 7 2019, 5:13 PM

It's still not fully clear to me where an online host with the spare::system Puppet role should live in this picture, but my best bet is that we should go towards not using that Puppet role at all IMHO.

@RobH, @Cmjohnson, @Papaul, @wiki_willy, @faidon thoughts?

We need to split this role into a few different names that do basically the same thing but also match status:

role::staged (basically role spare but for systems awaiting their proper role and push into service)
role::decommissioning (role spare but service owners push their server into this when they want dc ops to decommission the hardware)

planned state doesn't need a puppet role, since that is pre-OS installation. That should clarify the role::spare issue, right?

Change 508671 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] splitting role::spare into staged and decomisssioning

https://gerrit.wikimedia.org/r/508671

Mentioned in SAL (#wikimedia-operations) [2019-05-09T11:49:04Z] <volans> updated netbox statues for decommissioning and spare hosts according to T222352

I've updated the Server Lifecycle page, see the diff at:
https://wikitech.wikimedia.org/w/index.php?title=Server_Lifecycle&type=revision&diff=1825673&oldid=1823280
It might need some tweaking and fine-tuning, I've tried to keep the diff at a minimum.

I've updated Netbox devices, here the new lists:

Please review the list of Spare vs Planned to make sure I've applied them correctly.

Volans updated the task description. (Show Details)May 9 2019, 11:52 AM
Volans closed this task as Resolved.May 9 2019, 5:03 PM

Earlier today I've also sent an email to the Ops list with all the details.

Resolving for now, feel free to re-open if it needs any follow up work.

Change 508671 abandoned by RobH:
splitting role::spare into staged and decomisssioning

Reason:
Agreed!

https://gerrit.wikimedia.org/r/c/operations/puppet/ /508671