Page MenuHomePhabricator

magru: (2) VMs for ncredir
Closed, ResolvedPublic

Description

Cloud VPS Project Tested: N/A, established tech
Site/Location: magru
Number of systems: 2
Service: ncredir
Networking Requirements: external IP, for public traffic rerouting
Processor Requirements: 2
Memory: 4G
Disks: 20G
Other Requirements:

Event Timeline

BCornwall renamed this task from Site: (2) VMs for ncredir to magru: (2) VMs for ncredir.Wed, May 1, 12:43 AM
BCornwall changed the task status from Open to In Progress.
BCornwall claimed this task.
BCornwall edited projects, added Traffic; removed SRE.
BCornwall updated the task description. (Show Details)
BCornwall changed the task status from In Progress to Stalled.Wed, May 1, 12:46 AM

Waiting for magru to get Ganeti set up.

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host ncredir7001.magru.wmnet with OS bookworm

BCornwall changed the task status from Stalled to In Progress.Thu, May 2, 6:21 PM
BCornwall moved this task from Backlog to Traffic team actively servicing on the Traffic board.

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host ncredir7001.magru.wmnet with OS bookworm executed with errors:

  • ncredir7001 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" ncredir7001.magru.wmnet to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host ncredir7002.magru.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host ncredir7002.magru.wmnet with OS bookworm completed:

  • ncredir7002 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405031646_brett_3134898_ncredir7002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB