Page MenuHomePhabricator

Put cloudcephosd10[42-47] in service
Closed, ResolvedPublic

Description

  • cloudcephosd1042
  • cloudcephosd1043
  • cloudcephosd1044
  • cloudcephosd1045
  • cloudcephosd1046
  • cloudcephosd1047

Event Timeline

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudcephosd1042.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudcephosd1042.eqiad.wmnet with OS bullseye executed with errors:

  • cloudcephosd1042 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cloudcephosd1042.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudcephosd1042.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudcephosd1042.eqiad.wmnet with OS bullseye executed with errors:

  • cloudcephosd1042 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cloudcephosd1042.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudcephosd1042.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudcephosd1042.eqiad.wmnet with OS bullseye executed with errors:

  • cloudcephosd1042 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cloudcephosd1042.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudcephosd1042.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudcephosd1042.eqiad.wmnet with OS bullseye executed with errors:

  • cloudcephosd1042 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cloudcephosd1042.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-15T19:26:54Z] <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-15T19:32:57Z] <andrew@cloudcumin1001> END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T401693)

Change #1179218 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] osd.yaml: add entries for cloudcephosd1042, cloudcephosd104[67]

https://gerrit.wikimedia.org/r/1179218

Change #1179218 merged by Andrew Bogott:

[operations/puppet@production] osd.yaml: add entries for cloudcephosd1042, cloudcephosd104[67]

https://gerrit.wikimedia.org/r/1179218

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-15T19:51:21Z] <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-15T19:57:58Z] <andrew@cloudcumin1001> END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-15T20:18:03Z] <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-15T20:24:28Z] <andrew@cloudcumin1001> END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T401693)

Andrew triaged this task as Medium priority.Aug 20 2025, 2:12 PM

Change #1180693 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloudceph: add new OSDs: cloudcephosd1042-1051

https://gerrit.wikimedia.org/r/1180693

Change #1180693 merged by Andrew Bogott:

[operations/puppet@production] cloudceph: add new OSDs: cloudcephosd1042-1051

https://gerrit.wikimedia.org/r/1180693

Change #1180696 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloudceph: mark out some OSD nodes not yet ready for action

https://gerrit.wikimedia.org/r/1180696

Change #1180696 merged by Andrew Bogott:

[operations/puppet@production] cloudceph: mark out some OSD nodes not yet ready for action

https://gerrit.wikimedia.org/r/1180696

Change #1180697 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloudceph: further mark out OSD nodes not yet ready for action

https://gerrit.wikimedia.org/r/1180697

Change #1180697 merged by Andrew Bogott:

[operations/puppet@production] cloudceph: further mark out OSD nodes not yet ready for action

https://gerrit.wikimedia.org/r/1180697

Change #1181178 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] New attempt to put new cloudcephosd hosts online

https://gerrit.wikimedia.org/r/1181178

Change #1181178 merged by Andrew Bogott:

[operations/puppet@production] New attempt to put new cloudcephosd hosts online

https://gerrit.wikimedia.org/r/1181178

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-23T04:02:17Z] <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-23T04:08:27Z] <andrew@cloudcumin1001> END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T401693)

Change #1181221 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloudceph: remove cloudcephosd1045, again

https://gerrit.wikimedia.org/r/1181221

Change #1181221 merged by Andrew Bogott:

[operations/puppet@production] cloudceph: remove cloudcephosd1045, again

https://gerrit.wikimedia.org/r/1181221

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-23T04:21:20Z] <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-23T19:24:04Z] <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-23T19:27:31Z] <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-24T10:35:52Z] <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-24T17:19:31Z] <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-25T04:05:28Z] <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-25T04:06:33Z] <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-25T13:36:54Z] <andrew@cloudcumin1001> END (ERROR) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=97) (T401693)

Change #1181767 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Replace cloudvirt1045

https://gerrit.wikimedia.org/r/1181767

Change #1181767 merged by Andrew Bogott:

[operations/puppet@production] Replace cloudvirt1045

https://gerrit.wikimedia.org/r/1181767

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-25T21:07:14Z] <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-25T21:17:49Z] <andrew@cloudcumin1001> END (ERROR) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=97) (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-26T01:11:01Z] <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-26T01:20:41Z] <andrew@cloudcumin1001> END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-26T01:20:49Z] <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693)

Andrew updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-26T14:02:35Z] <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-09-05T15:10:24Z] <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-09-05T15:15:12Z] <andrew@cloudcumin1001> END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-09-05T15:30:28Z] <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-09-05T15:37:10Z] <andrew@cloudcumin1001> END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-09-05T15:37:47Z] <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-09-05T15:37:54Z] <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-09-05T15:37:59Z] <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-09-05T15:38:05Z] <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) (T401693)