Page MenuHomePhabricator

decom silver (was silver has trouble rebooting)
Closed, ResolvedPublic

Description

This was the old wikitech host. It's no longer in use so can be decom'd or made a spare. If the latter it needs a rebuild because currently it has trouble rebooting.

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role::spare::system if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Details

Related Gerrit Patches:

Related Objects

StatusSubtypeAssignedTask
ResolvedRobH
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
Resolvedbd808
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew

Event Timeline

Andrew created this task.Jun 21 2017, 6:19 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 21 2017, 6:19 PM
RobH added a subscriber: RobH.EditedJun 21 2017, 6:21 PM

https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1274320 is the one listing of the bug

silver has a very old parititioning setup, but it shouldn't really have this issue. It has a md0 for /, and an md1 for /a. Other ubuntu hosts (like labservices1001) have an md0 for /, and an md1 for /srv, but are otherwise similar in setup. Seems odd that silver has the issue but the others do not.

Then again, silver is slated for decom once its already onsite replacement is fully online, so not sure how much time to spend on this.

RobH assigned this task to Andrew.Aug 7 2017, 11:03 PM

@Andrew: Since this is slated for decom once the new system is in place, I'm assigning this to you for feedback. Please let me know when this system can be pulled and decommissioned.

Thanks!

RobH renamed this task from silver has trouble rebooting to decom silver (was silver has trouble rebooting).Aug 7 2017, 11:04 PM
RobH removed a project: Cloud-VPS.
RobH added a project: hardware-requests.

Is there already a task for the replacement of silver?

Change 419082 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] remove outdated references to wikitech on silver

https://gerrit.wikimedia.org/r/419082

Change 419082 merged by Andrew Bogott:
[operations/puppet@production] remove outdated references to wikitech on silver

https://gerrit.wikimedia.org/r/419082

Jdforrester-WMF reopened this task as Open.Jul 22 2018, 9:56 AM
Jdforrester-WMF added a subscriber: Jdforrester-WMF.

As this is in the tree, making this the target.

Andrew reassigned this task from Andrew to RobH.Aug 15 2018, 7:33 PM
Andrew triaged this task as Medium priority.

@RobH, sorry, this task seems to have been lost in phab for a while. Silver is already role(spare::system) in puppet so entirely out of my hands, can be reclaimed or decom'd whenever.

(And, I don't mind doing the decom steps if that helps, just lmk)

RobH closed this task as Resolved.Aug 15 2018, 8:47 PM
RobH reopened this task as Open.
RobH updated the task description. (Show Details)
RobH removed a subscriber: gerritbot.
RobH closed this task as Resolved.Sep 18 2018, 10:16 PM

duplicate of T168559