Page MenuHomePhabricator

decom bast3003 (65R8Q4J, formerly amslvs4)
Open, LowPublic

Description

In T184936 bast3002 was broken and to be replaced with another server, bast3003, which was formerly amslvs4. (and maybe also ms-be3003?)

To remove all ambiguity: The service tag is 65R8Q4J and the first MAC is a4:ba:db:38:e4:cb

The renamed happened in https://gerrit.wikimedia.org/r/c/operations/dns/+/405223

But then bast3002 was repaired so this bast3003 was not actually used.

Confusingly this also has been "amslvs4" apparently as the MAC address and serial number matches:

https://netbox.wikimedia.org/dcim/devices/1251/

This is the decom ticket to get that hardware back into pool or final decom it.


This task will track the decommission of bast3003.wikimedia.org, formerly amslvs4

https://netbox.wikimedia.org/dcim/devices/1251/

The first 5 steps should be completed by the service owner that is returning the server to DC-ops (for reclaim to spare or decommissioning, dependent on server configuration and age.)

ruthenium.eqiad.wmnet.

Steps for service owner:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.

Steps for DC-Ops:

The following steps cannot be interrupted, as it will leave the system in an unfinished state.

Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host) - RAN MANUALLY
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host) - CONFIRMED HOST NOT IN DEBMON host list

End non-interrupt steps.

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.
  • - IF RECLAIM: system added back to spares tracking (by onsite)

Event Timeline

Dzahn triaged this task as Low priority.Feb 14 2019, 10:59 PM
Dzahn created this task.
Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptFeb 14 2019, 10:59 PM
Dzahn renamed this task from decom bast3003 (formerly ms-be3003 to decom bast3003 (formerly ms-be3003).Feb 14 2019, 11:11 PM
Dzahn renamed this task from decom bast3003 (formerly ms-be3003) to decom bast3003 (formerly ms-be3003, formerly amslvs4).
Dzahn added projects: decommission, DC-Ops.
Dzahn updated the task description. (Show Details)
Dzahn edited subscribers, added: RobH; removed: mark, ops-monitoring-bot, Stashbot and 3 others.

Change 490788 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] Revert "DHCP: add 24:B6:FD:F6:17:3A as bast3003"

https://gerrit.wikimedia.org/r/490788

Change 490788 merged by Dzahn:
[operations/puppet@production] Revert "DHCP: add 24:B6:FD:F6:17:3A as bast3003"

https://gerrit.wikimedia.org/r/490788

Dzahn renamed this task from decom bast3003 (formerly ms-be3003, formerly amslvs4) to decom bast3003 (65R8Q4J, formerly amslvs4).Feb 14 2019, 11:28 PM
Dzahn updated the task description. (Show Details)

I removed production DNS entries but kept mgmt because the host does not have mgmt entries by asset tag. The DRAC IP is: 10.21.0.109 and it's still reachable under bast3003.mgmt.esams.wmnet.

host has already been shut down for a while

Dzahn assigned this task to RobH.Feb 15 2019, 12:35 AM
Dzahn removed a project: Patch-For-Review.
Dzahn updated the task description. (Show Details)

I did not start the "non-interrupt steps" now or anything but i marked the first 2 check boxes there because it's already removed from puppet (was never in /removed as bast3003 / was removed as amslvs) and it had already been powered off quite some time ago.

Mentioned in SAL (#wikimedia-operations) [2019-02-15T00:39:25Z] <mutante> puppetmaster1001: sudo puppet node clean bast3003.wikimedia.org ; sudo puppet node deactivate bast3003.wikimedia.org (T216199)

Dzahn updated the task description. (Show Details)Feb 15 2019, 12:40 AM
RobH removed RobH as the assignee of this task.Mar 7 2019, 9:34 PM
RobH moved this task from Backlog to Decommission on the ops-esams board.