Page MenuHomePhabricator

decom ruthenium
Closed, ResolvedPublic

Description

in the parent task T201366 scandium has replaced ruthenium as the parsoid-testing server (jessie -> stretch upgrade)

this is the decom task for ruthenium

Purchase date Aug. 3, 2011

Procurement ticket: RT #1220


This task will track the decommission-hardware of server ruthenium.eqiad.wmnet.

The first 5 steps should be completed by the service owner that is returning the server to DC-ops (for reclaim to spare or decommissioning, dependent on server configuration and age.)

ruthenium.eqiad.wmnet.

Steps for service owner:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.

Steps for DC-Ops:

The following steps cannot be interrupted, as it will leave the system in an unfinished state.

Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) asw2-b-eqiad:ge-4/0/25
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

End non-interrupt steps.

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update netbox with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Event Timeline

Dzahn triaged this task as Medium priority.Feb 13 2019, 7:24 PM
Dzahn created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 490391 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] start to decom ruthenium, turn into spare

https://gerrit.wikimedia.org/r/490391

Doing 'Active -> Staged' transition https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Active_-%3E_Staged

Not sure if i should still copy/paste the check boxes from the template as in the past or not.

Change 490391 merged by Dzahn:
[operations/puppet@production] start to decom ruthenium, turn into spare

https://gerrit.wikimedia.org/r/490391

Change 490407 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] admins/enforce-users-groups: remove exception for parsoid-rt user

https://gerrit.wikimedia.org/r/490407

Change 490411 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] DHCP: turn ruthenium into stretch, rm jessie installer part

https://gerrit.wikimedia.org/r/490411

Change 490411 merged by Dzahn:
[operations/puppet@production] DHCP: turn ruthenium into stretch, rm jessie installer part

https://gerrit.wikimedia.org/r/490411

Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts:

['ruthenium.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201902132225_dzahn_77810.log.

Dzahn updated the task description. (Show Details)
Dzahn added a subscriber: RobH.
Dzahn updated the task description. (Show Details)

Completed auto-reimage of hosts:

['ruthenium.eqiad.wmnet']

and were ALL successful.

Change 490497 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] netboot/partman/DCHP: remove ruthenium

https://gerrit.wikimedia.org/r/490497

Change 490497 merged by Dzahn:
[operations/puppet@production] netboot/partman/DCHP: remove ruthenium

https://gerrit.wikimedia.org/r/490497

Dzahn updated the task description. (Show Details)

Change 490407 merged by Dzahn:
[operations/puppet@production] admins/enforce-users-groups: remove exception for parsoid-rt user

https://gerrit.wikimedia.org/r/490407

Change 494626 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decom ruthenium

https://gerrit.wikimedia.org/r/494626

Change 494628 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom ruthenium prod dns entries

https://gerrit.wikimedia.org/r/494628

Change 494626 merged by RobH:
[operations/puppet@production] decom ruthenium

https://gerrit.wikimedia.org/r/494626

Change 494628 merged by RobH:
[operations/dns@master] decom ruthenium prod dns entries

https://gerrit.wikimedia.org/r/494628

RobH updated the task description. (Show Details)
RobH edited projects, added ops-eqiad; removed Patch-For-Review.
RobH moved this task from Backlog to Decommission on the ops-eqiad board.

Mentioned in SAL (#wikimedia-operations) [2019-03-06T09:05:13Z] <moritzm> removed debmonitor host entry for ruthenium (T216062)

The host is still in puppetdb/cumin, see e.g, puppetboard

The host is still in puppetdb/cumin, see e.g, puppetboard

Sorry about that, I thought I ran the decom script, but since it didn't echo here I must not, fixed.