Page MenuHomePhabricator

decom silver (was silver has trouble rebooting)
Closed, ResolvedPublic

Description

This was the old wikitech host. It's no longer in use so can be decom'd or made a spare. If the latter it needs a rebuild because currently it has trouble rebooting.

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role::spare::system if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Related Objects

StatusSubtypeAssignedTask
ResolvedRobH
ResolvedPRODUCTION ERRORAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
Resolvedbd808
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedPRODUCTION ERRORAndrew

Event Timeline

https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1274320 is the one listing of the bug

silver has a very old parititioning setup, but it shouldn't really have this issue. It has a md0 for /, and an md1 for /a. Other ubuntu hosts (like labservices1001) have an md0 for /, and an md1 for /srv, but are otherwise similar in setup. Seems odd that silver has the issue but the others do not.

Then again, silver is slated for decom once its already onsite replacement is fully online, so not sure how much time to spend on this.

@Andrew: Since this is slated for decom once the new system is in place, I'm assigning this to you for feedback. Please let me know when this system can be pulled and decommissioned.

Thanks!

RobH renamed this task from silver has trouble rebooting to decom silver (was silver has trouble rebooting).Aug 7 2017, 11:04 PM
RobH removed a project: Cloud-VPS.
RobH added a project: hardware-requests.

Is there already a task for the replacement of silver?

Change 419082 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] remove outdated references to wikitech on silver

https://gerrit.wikimedia.org/r/419082

Change 419082 merged by Andrew Bogott:
[operations/puppet@production] remove outdated references to wikitech on silver

https://gerrit.wikimedia.org/r/419082

Jdforrester-WMF added a subscriber: Jdforrester-WMF.

As this is in the tree, making this the target.

Andrew triaged this task as Medium priority.

@RobH, sorry, this task seems to have been lost in phab for a while. Silver is already role(spare::system) in puppet so entirely out of my hands, can be reclaimed or decom'd whenever.

(And, I don't mind doing the decom steps if that helps, just lmk)

RobH reopened this task as Open.
RobH updated the task description. (Show Details)
RobH removed a subscriber: gerritbot.