Page MenuHomePhabricator

Q4:rack/setup/install deploy1003
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of deploy1003

Hostname / Racking / Installation Details

Hostnames: deploy1003
Racking Proposal: Anywhere, replacing deploy1002 and has no other eqiad counterpart.
Networking Setup: # of Connections:1 - Speed:1G. - VLAN:Private* : AAAA records:Y, Additional IP records (Cassandra)? No
Partitioning/Raid: HW Raid: N, Partman recipe and/or desired Raid Level: partman/raid1-2dev.cfg (already in preseed.yml with 'deploy*' entry.
OS Distro: Bullseye
Sub-team Technical Contact:**

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

deploy1003:
  • Receive in system on procurement task T361355 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook

Event Timeline

RobH mentioned this in Unknown Object (Task).
RobH added a parent task: Unknown Object (Task).
RobH updated the task description. (Show Details)

fwiw - for the person who will add the production puppet role to this later: This is only possible since just recently but should be mostly unblocked now: details in T363415 - needs one more patch though where your review would be great.

cc: @akosiaris T363415#9762416 is a list of issues and their fix. https://gerrit.wikimedia.org/r/c/operations/puppet/+/1026193 would remove the last puppet error with deployment_server on bullseye - or rather - when talking to newer puppetservers

RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.
RobH added a subscriber: akosiaris.

@akosiaris,

The parent ordering task for the deploy1002 replacement didn't have racking info, but I didn't want to stall ordering to get it so I've created this racking task with some assumptions on this deploy host order. Please review the task description and correct as needed, then simply unassign yourself when done (so it has noone assigned.). This is already in the racking column for ops-eqiad.

Change #1050345 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] Add deploy1003 to site.pp

https://gerrit.wikimedia.org/r/1050345

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host deploy1003.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host deploy1003.eqiad.wmnet with OS bullseye executed with errors:

  • deploy1003 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" deploy1003.eqiad.wmnet to get a root shellbut depending on the failure this may not work.

Change #1050345 merged by Alexandros Kosiaris:

[operations/puppet@production] Add deploy1003 to site.pp

https://gerrit.wikimedia.org/r/1050345

Let's directly install this server with Puppet 7, there should be no issues in the deployment-server manifests in terms of Puppet 5/7 compat at this point.

Change #1050597 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/puppet@production] deploy1003: Switch to puppet7

https://gerrit.wikimedia.org/r/1050597

Change #1050597 merged by Alexandros Kosiaris:

[operations/puppet@production] deploy1003: Switch to puppet7

https://gerrit.wikimedia.org/r/1050597

Change #1050628 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/puppet@production] deploy1003: Assign role

https://gerrit.wikimedia.org/r/1050628

Change #1050628 merged by Alexandros Kosiaris:

[operations/puppet@production] deploy1003: Assign role

https://gerrit.wikimedia.org/r/1050628

Change #1051154 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/puppet@production] deployment_server: if guard php-readline to buster

https://gerrit.wikimedia.org/r/1051154

Change #1051154 merged by Alexandros Kosiaris:

[operations/puppet@production] deployment_server: if guard php-readline to buster

https://gerrit.wikimedia.org/r/1051154

akosiaris updated the task description. (Show Details)

Host is imaged, rest of the work is ongoing in T364417