Page MenuHomePhabricator

Q3:rack/setup/install db1206
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of 1 R650 Config E10G host as db1206 (next db hostname in sequence at time of task filing)

Hostname / Racking / Installation Details

Hostnames: db1206
Racking Proposal: Any rack works for us.
Networking Setup: # of Connections:1 , Speed:1G. Vlan: Private AAAA records: N
Partitioning/Raid: HW Raid: Y, Partman recipe and/or desired Raid Level: RAID10 (partman recipe already done in puppet by @Marostegui )
OS Distro: Bullseye (default unless otherwise specified)
Sub-team Technical Contact: @Marostegui

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

db1206:
  • - receive in system on procurement task <enter task # here> & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.

Related Objects

StatusSubtypeAssignedTask
ResolvedPapaul

Event Timeline

RobH added a parent task: Unknown Object (Task).
RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.
RobH added a subscriber: Jclark-ctr.

@Marostegui,

Can you populate the racking info (partitioning, network details, any rack restrictions) and then assign this over to @Jclark-ctr? Thanks!

RobH mentioned this in Unknown Object (Task).Nov 2 2022, 7:11 PM

Change 852653 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] install_server: Partmap recipe for db1206

https://gerrit.wikimedia.org/r/852653

Change 852653 merged by Marostegui:

[operations/puppet@production] install_server: partman recipe for db1206

https://gerrit.wikimedia.org/r/852653

Change 852654 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] mariadb: Add spare db1206

https://gerrit.wikimedia.org/r/852654

Change 852654 merged by Marostegui:

[operations/puppet@production] mariadb: Add spare db1206

https://gerrit.wikimedia.org/r/852654

@Marostegui,

Can you populate the racking info (partitioning, network details, any rack restrictions) and then assign this over to @Jclark-ctr? Thanks!

Done!. I have also done the puppet patches for the partman recipe and spare addition to site.pp

Marostegui updated the task description. (Show Details)

Change 856927 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] database-backups: Update partman recipe for dbprov1004 / dbprov2004

https://gerrit.wikimedia.org/r/856927

Change 856927 merged by Jcrespo:

[operations/puppet@production] database-backups: Update partman recipe for dbprov1004 / dbprov2004

https://gerrit.wikimedia.org/r/856927

db1206 B8 U36 Port 26 Cableid 3285

I will take a look once i have the OS going on db120[4-5]

RobH mentioned this in Unknown Object (Task).Nov 29 2022, 6:09 PM

@Jclark-ctr netbox is showing that the server is racked in B8 or on the task it says that the server is in rack B1 (db1206 B1 U36 Port 26 ) can you please double check.

Thanks.

@Papaul Sorry about that i had originally racked it in B1 but we are out of power ports. Netbox is correct B8

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1206.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1206.eqiad.wmnet with OS bullseye completed:

  • db1206 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202211302354_pt1979_2123349_db1206.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
Papaul updated the task description. (Show Details)

@Marostegui this is complete

Thanks Papaul!

@Marostegui: When provisioning this for production, I'd really appreciate if I can shadow you to learn how we add a db to rotation. Please 🥺