Page MenuHomePhabricator

wikikube-worker2346 DOA
Closed, ResolvedPublic

Description

wikikube-worker2346 is not working out of the box

does not power on
when power applied, it blinks a bunch and then shuts off.
luggage tag with mac and password was broken off. never saw it during unpacking.
opened up the server to inspect for lose wires and found that some of the paint on one of the coils of the power supply was scrapped off.
Feels like we got a refurb that wasn't completed?
Opened ticket with supermicro: #00069075

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

still not even a troubleshooting email from supermicro. i replied to the latest "we got your message" email to see if i can get their attention.

refreshed the email with supermicro support AGAIN

not sure how or why, but leaving the server to sit for 2-3 weeks makes the power cycling issue disappear. server booted and provisioned without issues. Not one to look a gift horse in the mouth on this of all days. will continue to install the server on this ticket.

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2346.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2346.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2346 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602171647_jhancock_1887267_wikikube-worker2346.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
Jhancock.wm claimed this task.
Jhancock.wm added a subscriber: Clement_Goubert.

@Clement_Goubert finally got this wayward server fixed up and it's ready for you to do what you need to do.