Page MenuHomePhabricator

(Need By: TBD) rack/setup/install an-tool1010.eqiad.wmnet
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of an-tool1010.eqiad.wmnet.

This is an existing spare pool host, allocated via T264347

Hostname / Racking / Installation Details

Hostnames: an-tool1010.eqiad.wmnet
Racking Proposal: spare pool, already racked
Networking/Subnet/VLAN/IP: internal1 vlan, 1g
Partitioning/Raid: standard raid
OS Distro: Buster

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.
an-tool1010:

  • - update hostname labels from 'sulfur' to 'an-tool1010'
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Event Timeline

RobH added a parent task: Unknown Object (Task).
RobH mentioned this in Unknown Object (Task).

Change 641764 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] an-tool1010 setup

https://gerrit.wikimedia.org/r/641764

Change 641764 merged by RobH:
[operations/puppet@production] an-tool1010 setup

https://gerrit.wikimedia.org/r/641764

Change 641806 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] an-tool1010 partition info

https://gerrit.wikimedia.org/r/641806

Change 641806 merged by RobH:
[operations/puppet@production] an-tool1010 partition info

https://gerrit.wikimedia.org/r/641806

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

an-tool1010.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202011181842_robh_7058_an-tool1010_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['an-tool1010.eqiad.wmnet']

Of which those FAILED:

['an-tool1010.eqiad.wmnet']

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

an-tool1010.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202011181842_robh_7094_an-tool1010_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['an-tool1010.eqiad.wmnet']

Of which those FAILED:

['an-tool1010.eqiad.wmnet']

I'm getting the no root filesystem defined during the installer. I've done something incorrectly in the setup. However, this entire host has been an exercise in frustration so I'm going to take a break from it for at least 30 minutes and then return to it more clear headed.

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

an-tool1010.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202011182156_robh_11172_an-tool1010_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['an-tool1010.eqiad.wmnet']

Of which those FAILED:

['an-tool1010.eqiad.wmnet']

Change 641840 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] fixing netboot file

https://gerrit.wikimedia.org/r/641840

Change 641840 merged by RobH:
[operations/puppet@production] fixing netboot file

https://gerrit.wikimedia.org/r/641840

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

an-tool1010.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202011182208_robh_14479_an-tool1010_eqiad_wmnet.log.

RobH added a subscriber: elukey.

@elukey,

This host is all staged and ready for you to apply proper puppet roles!

Completed auto-reimage of hosts:

['an-tool1010.eqiad.wmnet']

and were ALL successful.

Hmm, uh oh, I think this host needed to be placed in the Analytics VLAN. Ping @elukey @razzi @RobH

Hmm, uh oh, I think this host needed to be placed in the Analytics VLAN. Ping @elukey @razzi @RobH

Ah snap I didn't check, my bad! Rob wrote it in the task's description, my bad for not reviewing it!

@razzi I am going to figure out the procedure to change VLAN, it is definitely a good topic to chat when you have time!

After a chat with Riccardo and Arzhel, the idea is to:

  1. decom an-tool1010 (testing a new feature of the decom cookbook to auto-cleanup switch configs).
  2. re-provision the node in the Analytics VLAN (requires a reimage but it is fine for our purposes).

cookbooks.sre.hosts.decommission executed by elukey@cumin1001 for hosts: an-tool1010.eqiad.wmnet

  • an-tool1010.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Found physical host
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

an-tool1010.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202012101114_elukey_4413_an-tool1010_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['an-tool1010.eqiad.wmnet']

Of which those FAILED:

['an-tool1010.eqiad.wmnet']

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

an-tool1010.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202012101227_elukey_17444_an-tool1010_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['an-tool1010.eqiad.wmnet']

and were ALL successful.

All right the host is now up and running in the analytics vlan, this is the procedure that I followed:

  • ran the decom cookbook for an-tool1010
  • manually remove all the interfaces from netbox for an-tool1010, leaving only the mgmt one
  • ran homer to update the switch's config (ports removed from the private vlan, added to the disabled list)
  • set the status of an-tool1010 to "Planned" on netbox
  • ran the Netbox script (in the UI) to add network configs using the same port/switch details but setting the VLAN as "analytics"
  • ran home to update the switch config.
  • ran the sre.netbox.dns cookbook to create the new DNS records (since changing VLAN meant changing IP subnet)
  • ran wmf-auto-reimage to reinstall the OS.