Page MenuHomePhabricator
Paste P29974

sre.hardware.dell sretest1001 output
ActivePublic

Authored by jbond on Jun 23 2022, 9:48 AM.
Tags
None
Referenced Files
F35267436: sre.hardware.dell sretest1001 output
Jun 23 2022, 9:48 AM
Subscribers
None
$ sudo cookbook -c /home/jbond/cookbook.yaml sre.hardware.dell sretest1001.eqiad.wmnet
START - Cookbook sre.hardware.dell for hosts sretest1001.eqiad.wmnet
Management Password:
sretest1001.eqiad.wmnet (IDRAC): update
sretest1001.eqiad.wmnet: Already have: /srv/firmware/poweredge-r440/iDRAC-with-Lifecycle-Controller_Firmware_WPNPP_WN64_5.10.30.00_A00.EXE
sretest1001.eqiad.wmnet (IDRAC): latest_version: 5.10.30.00, current_version: 5.10.10.00
==> sretest1001.eqiad.wmnet IDRAC: About to upload /srv/firmware/poweredge-r440/iDRAC-with-Lifecycle-Controller_Firmware_WPNPP_WN64_5.10.30.00_A00.EXE, please confirm
Type "go" to proceed or "abort" to interrupt the execution
> go
==> sretest1001.eqiad.wmnet IDRAC: About to install Available-25227-5.10.30.00__iDRAC.Embedded.1-1, please confirm
Type "go" to proceed or "abort" to interrupt the execution
> go
sretest1001.eqiad.wmnet (IDRAC): has job ID - /redfish/v1/TaskService/Tasks/JID_559941188108
[IDRAC.2.5.RED003] Downloading package.
[1/30, retrying in 30.00s] Polling task: JID_559941188108 not completed yet: status=OK, state=Pending, completed=None%
[IDRAC.2.5.RED003] Downloading package.
[2/30, retrying in 30.00s] Polling task: JID_559941188108 not completed yet: status=OK, state=Pending, completed=None%
[IDRAC.2.5.RED001] Job completed successfully.
Testing Redfish API connection to sretest1001.mgmt.eqiad.wmnet
Testing Redfish API connection to sretest1001.mgmt.eqiad.wmnet
Testing Redfish API connection to sretest1001.mgmt.eqiad.wmnet
Testing Redfish API connection to sretest1001.mgmt.eqiad.wmnet
Testing Redfish API connection to sretest1001.mgmt.eqiad.wmnet
Testing Redfish API connection to sretest1001.mgmt.eqiad.wmnet
Testing Redfish API connection to sretest1001.mgmt.eqiad.wmnet
Testing Redfish API connection to sretest1001.mgmt.eqiad.wmnet
Testing Redfish API connection to sretest1001.mgmt.eqiad.wmnet
sretest1001.eqiad.wmnet (IDRAC): now at version: 5.10.30.00
sretest1001.eqiad.wmnet (BIOS): update
Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='sretest1001.mgmt.eqiad.wmnet', port=443): Read timed out. (read timeout=10)")': /redfish/v1/Systems/System.Embedded.1?$select=BiosVersion
sretest1001.eqiad.wmnet: Already have: /srv/firmware/poweredge-r440/BIOS_38PH6_WN64_2.14.2.EXE
sretest1001.eqiad.wmnet (BIOS): latest_version: 2.14.2, current_version: 1.3.7
==> sretest1001.eqiad.wmnet BIOS: About to upload /srv/firmware/poweredge-r440/BIOS_38PH6_WN64_2.14.2.EXE, please confirm
Type "go" to proceed or "abort" to interrupt the execution
> go
==> sretest1001.eqiad.wmnet BIOS: About to install Available-159-2.14.2__BIOS.Setup.1-1, please confirm
Type "go" to proceed or "abort" to interrupt the execution
> go
sretest1001.eqiad.wmnet (BIOS): has job ID - /redfish/v1/TaskService/Tasks/JID_559946045007
START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
Scheduling downtime on Icinga server alert1001.wikimedia.org for hosts: sretest1001
Created silence ID 07f35e87-65bc-4c24-bf77-249ea26a028c
Rebooting 1 hosts in batches of 1 with 0.0s of sleep in between: sretest1001.eqiad.wmnet
----- OUTPUT of 'reboot-host' -----
================
PASS |██████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00, 1.12hosts/s]
FAIL | | 0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'reboot-host'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
[1/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
[2/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
[3/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
[4/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
[5/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
[6/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
[7/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
[8/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
[9/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
[10/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
[11/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
[12/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
[13/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
[14/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
[15/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
[16/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
[17/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
[18/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
[19/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
[20/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet
Caused by: Cumin execution failed (exit_code=2)
Found reboot since 2022-06-23 09:30:08.489764 for hosts sretest1001.eqiad.wmnet
[1/60, retrying in 30.00s] Attempt to run 'spicerack.puppet.PuppetHosts.wait_since' raised: Successful Puppet run too old (2022-06-23 09:29:40 <= 2022-06-23 09:30:08.489764) on: sretest1001.eqiad.wmnet
Successful Puppet run found
[1/15, retrying in 3.00s] Attempt to run 'spicerack.icinga.IcingaHosts.wait_for_optimal.<locals>.check' raised: Not all services are recovered: sretest1001:Check for large files in client bucket,DPKG,MD RAID,configured eth,puppet last run
[2/15, retrying in 6.00s] Attempt to run 'spicerack.icinga.IcingaHosts.wait_for_optimal.<locals>.check' raised: Not all services are recovered: sretest1001:Check for large files in client bucket,DPKG,MD RAID,configured eth,puppet last run
[3/15, retrying in 9.00s] Attempt to run 'spicerack.icinga.IcingaHosts.wait_for_optimal.<locals>.check' raised: Not all services are recovered: sretest1001:Check for large files in client bucket,DPKG,MD RAID,configured eth,puppet last run
[4/15, retrying in 12.00s] Attempt to run 'spicerack.icinga.IcingaHosts.wait_for_optimal.<locals>.check' raised: Not all services are recovered: sretest1001:Check for large files in client bucket,DPKG,MD RAID,configured eth,puppet last run
[5/15, retrying in 15.00s] Attempt to run 'spicerack.icinga.IcingaHosts.wait_for_optimal.<locals>.check' raised: Not all services are recovered: sretest1001:Check for large files in client bucket,DPKG,MD RAID,configured eth,puppet last run
[6/15, retrying in 18.00s] Attempt to run 'spicerack.icinga.IcingaHosts.wait_for_optimal.<locals>.check' raised: Not all services are recovered: sretest1001:Check for large files in client bucket,DPKG,MD RAID,configured eth,puppet last run
[7/15, retrying in 21.00s] Attempt to run 'spicerack.icinga.IcingaHosts.wait_for_optimal.<locals>.check' raised: Not all services are recovered: sretest1001:Check for large files in client bucket,DPKG,MD RAID,configured eth,puppet last run
Deleted silence ID 07f35e87-65bc-4c24-bf77-249ea26a028c
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
[IDRAC.2.5.PR19] The specified job has completed successfully.
sretest1001.eqiad.wmnet (BIOS): now at version: 2.14.2
END (PASS) - Cookbook sre.hardware.dell (exit_code=0) for hosts sretest1001.eqiad.wmnet