Page MenuHomePhabricator

Decommission old and unused/spare servers in eqiad
Closed, ResolvedPublic

Description

The following servers are old (> ~4.5 years old, sometimes > 5 years old!) and are either spare, or just unused, so they should just be decommissioned and unracked. Please double check whether they are not online/unused and do follow the steps of the checklist (wipes etc.) before decom'ing.

hostnamepurchase taskpurchase date
copperRT #5272011-01-27
wmf3248RT #5272011-01-27
wmf3443RT #5932011-10-13
zirconiumRT #12202011-08-03
caesiumRT #32782012-08-29
wmf4077RT #32782012-08-29
lawrenciumRT #39182012-12-05
wmf3560RT #41052013-01-11
wmf3565RT #41052013-01-11
promethiumRT #42812013-01-22
wmf3570RT #42812013-01-22
wmf4182RT #51752013-06-04
wmf4183RT #51752013-06-04
wmf4195RT #51752013-06-04
wmf4196RT #51752013-06-04

(note that lawrencium is online, but with role spare::system)

There are also a few more that are ~4 years old, and we could keep them as spares for a while longer, so let's NOT decom these just yet:

hostnamepurchase taskpurchase date
nobeliumRT #65832014-03-19
wmf4545RT #65832014-03-19
astatineRT #71452014-05-01
leadRT #71452014-05-01
poloniumRT #71452014-05-01
wmf4579RT #71452014-05-01

Event Timeline

faidon triaged this task as Medium priority.Feb 15 2018, 5:25 PM
faidon created this task.

Please note that every system on this list will need to be decommission and have the following checklist applied PER HOST:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.
  • - IF RECLAIM: system added back to spares tracking (by onsite)

So all of these hosts were on the eqiad spare tracking, but need to be decommissioned:

Asset Tag Hostname
WMF3129 wmf3129
WMF3248 old ms1004 system
WMF3291 vanadium
WMF3428 niobium
WMF3542 lawrencium
WMF4077
WMF4079 caesium
WMF4083 iodine
WMF3559 gadolinium
WMF3560
WMF3561 erbium
WMF3565

this is now tracked individually within netbox, this is very outdated task, closing