Page MenuHomePhabricator

db2212 failed to reboot
Closed, ResolvedPublic

Description

db2212 failed to reboot as part of T426633

The host hasn't come back yet after rebooting at:

2026-05-27 12:07:22.661071 cumin db2212.codfw.wmnet "reboot-host" --force --no-progress

Event Timeline

There are no events in getsel after 06/13/2025 14:24:15

Mentioned in SAL (#wikimedia-operations) [2026-05-27T13:05:57Z] <fceratto@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 99 days, 0:00:00 on db2212.codfw.wmnet with reason: failed to reboot T427388 T426633

it halted in the boot and i had to pull the power entirely to get it to reboot and make it past post. There still isn't anything new in the event logs. Can I update the firmware for this one? or do you need it back quick since it's high priority

it halted in the boot and i had to pull the power entirely to get it to reboot and make it past post. There still isn't anything new in the event logs. Can I update the firmware for this one? or do you need it back quick since it's high priority

Yes please, update everything that can be updated :)
Thanks!

Change #1294807 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2212: Disable notifications

https://gerrit.wikimedia.org/r/1294807

Change #1294807 merged by Marostegui:

[operations/puppet@production] db2212: Disable notifications

https://gerrit.wikimedia.org/r/1294807

The host was shut down cleanly so I can check and repool it.

No error in the logs, replication is catching up.

Change #1295389 had a related patch set uploaded (by Federico Ceratto; author: Federico Ceratto):

[operations/puppet@production] Enable notifications for db2212

https://gerrit.wikimedia.org/r/1295389

Change #1295389 merged by Federico Ceratto:

[operations/puppet@production] db2212: Enable notifications

https://gerrit.wikimedia.org/r/1295389

Starting pool of db2212 by fceratto@cumin1003: Pooling

Completed pooling of db2212 by fceratto@cumin1003: Pooling