Page MenuHomePhabricator

Q3:rack/setup/install db1207-db1225
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of db1207-db1225

Hostname / Racking / Installation Details

Hostnames: db1207-db1229
Racking Proposal: Whatever is easier for DCOps, we don't have any preference as long as we don't place more than 2 on the same rack.
Networking Setup: # of Connections:1 , Speed:1G. Vlan: Private AAAA records: N
Partitioning/Raid: HW Raid: Y, Partman recipe and/or desired Raid Level: RAID10 (partman recipe already done in puppet by @Marostegui )
OS Distro: Bullseye
Sub-team Technical Contact: @Marostegui

Per host setup checklist

db1207:
  • - receive in system on procurement task T325209 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
db1208:
  • - receive in system on procurement task T325209 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
db1209:
  • - receive in system on procurement task T325209 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
db1210:
  • - receive in system on procurement task T325209 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
db1211:
  • - receive in system on procurement task T325209 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
db1212:
  • - receive in system on procurement task T325209 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
db1213:
  • - receive in system on procurement task T325209 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
db1215:
  • - receive in system on procurement task T325209 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
db1216:
  • - receive in system on procurement task T325209 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
db1217:
  • - receive in system on procurement task T325209 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
db1218:
  • - receive in system on procurement task T325209 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
db1219:
  • - receive in system on procurement task T325209 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
db1220:
  • - receive in system on procurement task T325209 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
db1221:
  • - receive in system on procurement task T325209 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
db1222:
  • - receive in system on procurement task T325209 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
db1223:
  • - receive in system on procurement task T325209 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
db1224:
  • - receive in system on procurement task T325209 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
db1225:
  • - receive in system on procurement task T325209 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.

Related Objects

StatusSubtypeAssignedTask
ResolvedPapaul
ResolvedMarostegui
ResolvedMarostegui
ResolvedRequestVRiley-WMF
ResolvedRequestVRiley-WMF
ResolvedRequestJclark-ctr
ResolvedMarostegui
ResolvedMarostegui
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedRequestJclark-ctr
ResolvedRequestJclark-ctr
ResolvedRequestJclark-ctr
ResolvedRequestJclark-ctr
ResolvedRequestJclark-ctr
ResolvedRequestJclark-ctr
ResolvedMarostegui
ResolvedRequestJclark-ctr
ResolvedRequestwiki_willy
ResolvedRequestJclark-ctr
ResolvedMarostegui
ResolvedRequestJclark-ctr
ResolvedRequestJclark-ctr
ResolvedRequestJclark-ctr
ResolvedMarostegui
ResolvedRequestJclark-ctr
ResolvedRequestJclark-ctr
ResolvedRequestVRiley-WMF
ResolvedRequestVRiley-WMF
OpenRequestMarostegui
DeclinedBTullis
Resolvedjcrespo
ResolvedMarostegui
ResolvedJclark-ctr
ResolvedMarostegui

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Marostegui renamed this task from Q3:rack/setup/install db1207-db1225 to Q3:rack/setup/install db1207-db1229.Jan 10 2023, 5:48 PM
Marostegui updated the task description. (Show Details)
RobH renamed this task from Q3:rack/setup/install db1207-db1229 to Q3:rack/setup/install db1207-db1225.Jan 10 2023, 6:14 PM
RobH updated the task description. (Show Details)

Change 878182 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] mariadb: Adjust new eqiad hosts

https://gerrit.wikimedia.org/r/878182

Change 878182 merged by Marostegui:

[operations/puppet@production] mariadb: Adjust new eqiad hosts

https://gerrit.wikimedia.org/r/878182

Any ETA to get these (or some) racked and installed? Thanks!

I am in process of racking right now will have them finished being racked and cabled in the next day or so

db1207 a5 u22 Port 40 Cableid 2570
db1208 a5 u23 Port 41 Cableid 1880
db1209 a6 u24 Port 36 Cableid 1918
db1210 a6 u25 Port 41 Cableid 1949
db1211 b5 u27 Port 16 Cableid 3283
db1212 b5 u28 Port 17 Cableid 3282
db1213 b6 u37 Port 42 Cableid 23000001
db1214 b6 u38 Port 41 Cableid 23000012
db1215 b3 u38 Port 27 Cableid 1944
db1216 b3 u39 Port 14 Cableid 5236
db1217 c5 u39 Port 29 Cableid 4011
db1218 c5 u40 Port 28 Cableid 4010
db1219 c6 u29 Port 29 Cableid 1946
db1220 c6 u30 Port 26 Cableid 3248
db1221 d1 u34 Port 20 Cableid 3613
db1222 d3 u26 Port 26 Cableid 1961
db1223 d3 u27 Port 39 Cableid 5089
db1224 d6 u37 Port 37 Cableid 23000046
db1225 d6 u38 Port 38 Cableid 23000007

Jclark-ctr updated the task description. (Show Details)
Jclark-ctr added subscribers: Cmjohnson, Jclark-ctr.

@Cmjohnson can you assist with next steps of these?

While running the provision cookbook on 2 of the db nodes (db1207 and db1208) and gerrit1003 i am getting the error .

Raised while handling: The `choices` argument is empty and no custom validator was provided.
Failed to run cookbooks.sre.hosts.provision.ProvisionRunner._config: The `choices` argument is empty and no custom validator was provided.

@Papaul please do not reimage db1206, that host is already in production. We bought it in advance to test the raid controller as it's a new one. So it's serving traffic.

@Marostegui it is db1207 and db1208 not db1206.

Great, as you mentioned db1206 earlier I got scared :)

@Volans

100.0% (1/1) success ratio (>= 100.0% threshold) for command: '/usr/local/sbin/...cludes -r commit'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
pt1979@cumin2002:~$ sudo cookbook sre.hosts.provision db1207 --no-dhcp --no-users
Management Password:
Testing Redfish API connection to cumin2002 (10.193.0.139)
==> Are you sure to proceed to apply BIOS/iDRAC settings for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED?
Type "go" to proceed or "abort" to interrupt the execution
> go
User input is: "go"
START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
Testing Redfish API connection to db1207 (10.65.1.14)
==> Detected Hardware RAID. Please configure the RAID at this point (the password is still DELL default one). Once done select "modified" if the RAID was modified or "untouched" if it was not touched. If the RAID was modified the host will be rebooted to make sure the changes are applied.
> modified
User input is: "modified"
Rebooting the host with policy ChassisResetPolicy.FORCE_RESTART and waiting for 3 minutes
Resetting chassis power status for db1207 to ForceRestart
Testing Redfish API connection to db1207 (10.65.1.14)
[IDRAC.2.7.SYS057] Exporting Server Configuration Profile.
[1/30, retrying in 30.00s] Polling task: JID_800971819368 not completed yet: status=OK, state=Running, completed=10%
First attempt to load the new configuration failed, auto-retrying once
Testing Redfish API connection to db1207 (10.65.1.14)
[IDRAC.2.7.SYS057] Exporting Server Configuration Profile.
[1/30, retrying in 30.00s] Polling task: JID_800972141788 not completed yet: status=OK, state=Running, completed=10%
Raised while handling: The `choices` argument is empty and no custom validator was provided.
Failed to run cookbooks.sre.hosts.provision.ProvisionRunner._config: The `choices` argument is empty and no custom validator was provided.
==> What do you want to do? "retry" the last command, manually fix the issue and "skip" the last command to continue the execution or completely "abort" the execution.

Change 904201 had a related patch set uploaded (by Volans; author: Volans):

[operations/cookbooks@master] sre.hosts.provision: handle the case of no NICs

https://gerrit.wikimedia.org/r/904201

Change 904201 merged by jenkins-bot:

[operations/cookbooks@master] sre.hosts.provision: handle the case of no NICs

https://gerrit.wikimedia.org/r/904201

Change 904235 had a related patch set uploaded (by Volans; author: Volans):

[operations/cookbooks@master] sre.hosts.provision: fix NIC link detection

https://gerrit.wikimedia.org/r/904235

Change 904235 merged by jenkins-bot:

[operations/cookbooks@master] sre.hosts.provision: fix NIC link detection

https://gerrit.wikimedia.org/r/904235

After the switch configuration step I get the output below and

Testing Redfish API connection to db1209 (10.65.1.88)
Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f1ba0fcf7f0>, 'Connection to 10.65.1.88 timed out. (connect timeout=10)')': /redfish
Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f1ba0fcf130>, 'Connection to 10.65.1.88 timed out. (connect timeout=10)')': /redfish
Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f1ba0fcf070>, 'Connection to 10.65.1.88 timed out. (connect timeout=10)')': /redfish
Failed to run cookbooks.sre.hosts.provision.ProvisionRunner.run.<locals>.check_connection: Unable to connect to the Redfish API of db1209. Follow https://wikitech.wikimedia.org/wiki/SRE/Dc-operations/Platform-specific_documentation/Dell_Documentation#Troubleshooting_2

@jbond all the server @Jclark-ctr and I worked on are failing with the error below.

START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1207']
Management Password: 
db1207.eqiad.wmnet (Gen 15): starting
Exception raised while executing cookbook sre.hardware.upgrade-firmware:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/spicerack/_menu.py", line 212, in run
    raw_ret = runner.run()
  File "/srv/deployment/spicerack/cookbooks/sre/hardware/upgrade-firmware.py", line 907, in run
    self.update_idrac(redfish_host, netbox_host)
  File "/srv/deployment/spicerack/cookbooks/sre/hardware/upgrade-firmware.py", line 677, in update_idrac
    last_reboot = redfish_host.last_reboot()
  File "/usr/lib/python3/dist-packages/spicerack/redfish.py", line 294, in last_reboot
    results = self.request("get", self.log_entries).json()
  File "/usr/lib/python3/dist-packages/spicerack/redfish.py", line 354, in request
    raise RedfishError(
spicerack.redfish.RedfishError: GET https://10.65.1.14/redfish/v1/Managers/Logs/Lclog returned HTTP 404 with message:
{"error":{"@Message.ExtendedInfo":[{"Message":"Unable to complete the operation because the resource Logs entered is not found.","MessageArgs":["Logs"],"MessageArgs@odata.count":1,"MessageId":"IDRAC.2.7.SYS403","RelatedProperties":[],"RelatedProperties@odata.count":0,"Resolution":"Enter the correct resource and retry the operation. For resources with numeric ID in the URI, enable the \"Redfish.1#NumericDynamicSegmentsEnable\" attribute and retry the operation. For information about valid resource, see the Redfish Users Guide available on the support site.","Severity":"Critical"},{"Message":"The resource at the URI 'Logs' was not found.","MessageArgs":["Logs"],"MessageArgs@odata.count":1,"MessageId":"Base.1.12.ResourceMissingAtURI","RelatedProperties":[],"RelatedProperties@odata.count":0,"Resolution":"Place a valid resource at the URI or correct the URI and resubmit the request.","Severity":"Critical"}],"code":"Base.1.12.GeneralError","message":"A general error has occurred. See ExtendedInfo for more information"}}
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1207']

@Jclark-ctr when you are back on site can you please check the network mgmt cable for db1209 and db1210.
Thanks

Change 904543 had a related patch set uploaded (by Jbond; author: jbond):

[operations/software/spicerack@master] redfish: update log entries location

https://gerrit.wikimedia.org/r/904543

Change 904543 merged by jenkins-bot:

[operations/software/spicerack@master] redfish: update log entries location

https://gerrit.wikimedia.org/r/904543

Change 904574 had a related patch set uploaded (by Papaul; author: Papaul):

[operations/puppet@production] Add new db nodes to site.pp

https://gerrit.wikimedia.org/r/904574

Change 904574 merged by Papaul:

[operations/puppet@production] Add new db nodes to site.pp

https://gerrit.wikimedia.org/r/904574

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1207.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1208.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1207.eqiad.wmnet with OS bullseye completed:

  • db1207 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303301701_pt1979_4056354_db1207.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1211.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1211.eqiad.wmnet with OS bullseye completed:

  • db1211 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303301736_pt1979_4079753_db1211.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1208.eqiad.wmnet with OS bullseye completed:

  • db1208 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303301721_pt1979_4068647_db1208.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • Failed to run the sre.puppet.sync-netbox-hiera cookbook, run it manually

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1212.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1213.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1212.eqiad.wmnet with OS bullseye completed:

  • db1212 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303301822_pt1979_4113574_db1212.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1213.eqiad.wmnet with OS bullseye completed:

  • db1213 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303301830_pt1979_4120179_db1213.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1215.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1214.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1215.eqiad.wmnet with OS bullseye completed:

  • db1215 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303301908_pt1979_4149355_db1215.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • Failed to run the sre.puppet.sync-netbox-hiera cookbook, run it manually

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1214.eqiad.wmnet with OS bullseye completed:

  • db1214 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303301908_pt1979_4149237_db1214.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1216.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1217.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1216.eqiad.wmnet with OS bullseye completed:

  • db1216 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303301954_pt1979_4182817_db1216.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1217.eqiad.wmnet with OS bullseye completed:

  • db1217 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303301955_pt1979_4183067_db1217.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • Failed to run the sre.puppet.sync-netbox-hiera cookbook, run it manually

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1218.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1219.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1219.eqiad.wmnet with OS bullseye completed:

  • db1219 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303302033_pt1979_20147_db1219.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • Failed to run the sre.puppet.sync-netbox-hiera cookbook, run it manually

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1218.eqiad.wmnet with OS bullseye completed:

  • db1218 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303302032_pt1979_19992_db1218.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1220.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1221.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1209.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1210.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1221.eqiad.wmnet with OS bullseye completed:

  • db1221 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303302250_pt1979_116741_db1221.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • Failed to run the sre.puppet.sync-netbox-hiera cookbook, run it manually

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1220.eqiad.wmnet with OS bullseye completed:

  • db1220 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303302247_pt1979_112867_db1220.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1222.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1209.eqiad.wmnet with OS bullseye completed:

  • db1209 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303302302_pt1979_126286_db1209.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1223.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1224.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1222.eqiad.wmnet with OS bullseye completed:

  • db1222 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303302330_pt1979_150693_db1222.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1210.eqiad.wmnet with OS bullseye completed:

  • db1210 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303302313_pt1979_134540_db1210.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • Failed to run the sre.puppet.sync-netbox-hiera cookbook, run it manually

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host db1225.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1223.eqiad.wmnet with OS bullseye completed:

  • db1223 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303302341_pt1979_157488_db1223.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1224.eqiad.wmnet with OS bullseye completed:

  • db1224 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303302351_pt1979_164968_db1224.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • Failed to run the sre.puppet.sync-netbox-hiera cookbook, run it manually

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host db1225.eqiad.wmnet with OS bullseye completed:

  • db1225 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303310007_pt1979_179016_db1225.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

@Marostegui your 19 servers are ready have fun

Thank you Papaul, they look good!