Page MenuHomePhabricator

decom bast3003 (65R8Q4J, formerly amslvs4)
Closed, ResolvedPublic

Description

In T184936 bast3002 was broken and to be replaced with another server, bast3003, which was formerly amslvs4. (and maybe also ms-be3003?)

To remove all ambiguity: The service tag is 65R8Q4J and the first MAC is a4:ba:db:38:e4:cb

The renamed happened in https://gerrit.wikimedia.org/r/c/operations/dns/+/405223

But then bast3002 was repaired so this bast3003 was not actually used.

Confusingly this also has been "amslvs4" apparently as the MAC address and serial number matches:

https://netbox.wikimedia.org/dcim/devices/1251/

This is the decom ticket to get that hardware back into pool or final decom it.


This task will track the decommission-hardware of bast3003.wikimedia.org, formerly amslvs4

https://netbox.wikimedia.org/dcim/devices/1251/

The first 5 steps should be completed by the service owner that is returning the server to DC-ops (for reclaim to spare or decommissioning, dependent on server configuration and age.)

ruthenium.eqiad.wmnet.

Steps for service owner:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.

Steps for DC-Ops:

The following steps cannot be interrupted, as it will leave the system in an unfinished state.

Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host) - RAN MANUALLY
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host) - CONFIRMED HOST NOT IN DEBMON host list

End non-interrupt steps.

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Event Timeline

Dzahn created this task.
Dzahn renamed this task from decom bast3003 (formerly ms-be3003 to decom bast3003 (formerly ms-be3003).Feb 14 2019, 11:11 PM
Dzahn renamed this task from decom bast3003 (formerly ms-be3003) to decom bast3003 (formerly ms-be3003, formerly amslvs4).
Dzahn updated the task description. (Show Details)
Dzahn edited subscribers, added: RobH; removed: mark, ops-monitoring-bot, Stashbot and 3 others.

Change 490788 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] Revert "DHCP: add 24:B6:FD:F6:17:3A as bast3003"

https://gerrit.wikimedia.org/r/490788

Change 490788 merged by Dzahn:
[operations/puppet@production] Revert "DHCP: add 24:B6:FD:F6:17:3A as bast3003"

https://gerrit.wikimedia.org/r/490788

Dzahn renamed this task from decom bast3003 (formerly ms-be3003, formerly amslvs4) to decom bast3003 (65R8Q4J, formerly amslvs4).Feb 14 2019, 11:28 PM
Dzahn updated the task description. (Show Details)

I removed production DNS entries but kept mgmt because the host does not have mgmt entries by asset tag. The DRAC IP is: 10.21.0.109 and it's still reachable under bast3003.mgmt.esams.wmnet.

host has already been shut down for a while

Dzahn removed a project: Patch-For-Review.
Dzahn updated the task description. (Show Details)

I did not start the "non-interrupt steps" now or anything but i marked the first 2 check boxes there because it's already removed from puppet (was never in /removed as bast3003 / was removed as amslvs) and it had already been powered off quite some time ago.

Mentioned in SAL (#wikimedia-operations) [2019-02-15T00:39:25Z] <mutante> puppetmaster1001: sudo puppet node clean bast3003.wikimedia.org ; sudo puppet node deactivate bast3003.wikimedia.org (T216199)

RobH removed RobH as the assignee of this task.Mar 7 2019, 9:34 PM
RobH moved this task from Backlog to Decommission on the ops-esams board.

@Papaul @RobH Because of this ticket it's probably better to call the new server bast3004 and finish this decom ticket separately, to avoid more ambiguity on this already confusing ticket history. (Renaming of other hosts as bast3002, bast3003, broken hardware etc).

Brandon also adjusted his DNS change to use bast3004 (https://gerrit.wikimedia.org/r/c/operations/dns/+/545662)

overview:

  • decom bast3001 T159480
  • decom bast3002 - had no ticket yet - i made one by clicking the template button T236329 but of course that is the active server and needs to wait until new hardware is up as bast3004
  • decom bast3003 T216199
  • rack/install bast3004 - no ticket yet (The other esams hosts all have rack/install tickets but not this one yet. It looks like a template too.)

I added 'decom bast3003' and 'relabel hooft' to T235805.

Change 547358 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt DNS for bast3003

https://gerrit.wikimedia.org/r/547358

Change 547358 merged by Papaul:
[operations/dns@master] DNS: Remove mgmt DNS for bast3003

https://gerrit.wikimedia.org/r/547358

Papaul claimed this task.

complete