Page MenuHomePhabricator

Q1:rack/setup/install cloudlb2004-dev
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of cloudlb2004-dev

Hostname / Racking / Installation Details

Hostnames: cloudlb2004-dev.codfw.wmnet
Racking Proposal: codfw B1
Networking Setup: # of Connections: 1 - Speed: 10G. - VLAN: cloudsw1-b1-codfw - AAAA records: Y, Additional IP records (Cassandra)? No
Partitioning/Raid: HW Raid: N, Partman recipe and/or desired Raid Level: raid 10, mirror the two drives
OS Distro: Bookworm
Sub-team Technical Contact: Arturo Borrero

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

cloudlb2004-dev
  • Receive in system on procurement task T368965 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook

Related Objects

Event Timeline

RobH mentioned this in Unknown Object (Task).
RobH added a parent task: Unknown Object (Task).
RobH unsubscribed.

@aborrero when you have a moment, can you do this step for me please? thanks!
Update the operations/puppet repo

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm executed with errors:

  • cloudlb2004-dev (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" cloudlb2004-dev.codfw.wmnet to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm executed with errors:

  • cloudlb2004-dev (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" cloudlb2004-dev.codfw.wmnet to get a root shellbut depending on the failure this may not work.

@aborrero I accidentally ran a few imaging attempts while just going through lists. Could you update the site.pp file for us? Thanks!

I'm sorry I completely missed the pings here. I will get this done today.

Change #1077049 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudlb2004-dev: give it a puppet role

https://gerrit.wikimedia.org/r/1077049

Change #1077049 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudlb2004-dev: give it a puppet role

https://gerrit.wikimedia.org/r/1077049

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm executed with errors:

  • cloudlb2004-dev (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details,You can also try typing "sudo install-console cloudlb2004-dev.codfw.wmnet" to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm executed with errors:

  • cloudlb2004-dev (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details,You can also try typing "sudo install-console cloudlb2004-dev.codfw.wmnet" to get a root shellbut depending on the failure this may not work.

Change #1077337 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudlb2004-dev: use insetup role and add partman recipe

https://gerrit.wikimedia.org/r/1077337

Change #1077337 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudlb2004-dev: use insetup role and add partman recipe

https://gerrit.wikimedia.org/r/1077337

please @Jhancock.wm try again with this one after the patch I merged yesterday.

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm executed with errors:

  • cloudlb2004-dev (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details,You can also try typing "sudo install-console cloudlb2004-dev.codfw.wmnet" to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm executed with errors:

  • cloudlb2004-dev (FAIL)
    • The reimage failed, see the cookbook logs for the details,You can also try typing "sudo install-console cloudlb2004-dev.codfw.wmnet" to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm executed with errors:

  • cloudlb2004-dev (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details,You can also try typing "sudo install-console cloudlb2004-dev.codfw.wmnet" to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm completed:

  • cloudlb2004-dev (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202410111637_jhancock_1551871_cloudlb2004-dev.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

@aborrero this is finally ready. turned into a learning opprotunity