Page MenuHomePhabricator

Decommission old and unused/spare servers in eqiad
Open, NormalPublic

Description

The following servers are old (> ~4.5 years old, sometimes > 5 years old!) and are either spare, or just unused, so they should just be decommissioned and unracked. Please double check whether they are not online/unused and do follow the steps of the checklist (wipes etc.) before decom'ing.

hostnamepurchase taskpurchase date
copperRT #5272011-01-27
wmf3248RT #5272011-01-27
wmf3443RT #5932011-10-13
zirconiumRT #12202011-08-03
caesiumRT #32782012-08-29
wmf4077RT #32782012-08-29
lawrenciumRT #39182012-12-05
wmf3560RT #41052013-01-11
wmf3565RT #41052013-01-11
promethiumRT #42812013-01-22
wmf3570RT #42812013-01-22
wmf4182RT #51752013-06-04
wmf4183RT #51752013-06-04
wmf4195RT #51752013-06-04
wmf4196RT #51752013-06-04

(note that lawrencium is online, but with role spare::system)

There are also a few more that are ~4 years old, and we could keep them as spares for a while longer, so let's NOT decom these just yet:

hostnamepurchase taskpurchase date
nobeliumRT #65832014-03-19
wmf4545RT #65832014-03-19
astatineRT #71452014-05-01
leadRT #71452014-05-01
poloniumRT #71452014-05-01
wmf4579RT #71452014-05-01

Event Timeline

faidon triaged this task as Normal priority.Feb 15 2018, 5:25 PM
faidon created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 15 2018, 5:25 PM
RobH claimed this task.Feb 15 2018, 5:32 PM
RobH added a project: hardware-requests.
faidon updated the task description. (Show Details)Feb 15 2018, 5:33 PM
RobH added a comment.Feb 15 2018, 5:34 PM

Please note that every system on this list will need to be decommission and have the following checklist applied PER HOST:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.
  • - IF RECLAIM: system added back to spares tracking (by onsite)
Cmjohnson moved this task from Backlog to Not urgent on the ops-eqiad board.Feb 16 2018, 3:47 PM
Cmjohnson moved this task from Not urgent to Blocked on the ops-eqiad board.
RobH added a comment.Mar 15 2018, 5:06 PM

So all of these hosts were on the eqiad spare tracking, but need to be decommissioned:

Asset Tag Hostname
WMF3129 wmf3129
WMF3248 old ms1004 system
WMF3291 vanadium
WMF3428 niobium
WMF3542 lawrencium
WMF4077
WMF4079 caesium
WMF4083 iodine
WMF3559 gadolinium
WMF3560
WMF3561 erbium
WMF3565

wiki_willy moved this task from Blocked to Decommission on the ops-eqiad board.Jul 2 2019, 9:37 PM