Page MenuHomePhabricator

db2034 crash
Closed, DuplicatePublic

Description

db2034 crashed and paged. When I logged in via mgmt, I got the following scrolling past when on VSP. No serial output otherwise just scrolling:

[10605869.309381] BUG: soft lockup - CPU#31 stuck for 22s! [migration/31:292]
[10605897.339575] BUG: soft lockup - CPU#31 stuck for 22s! [migration/31:292]
[10605925.369770] BUG: soft lockup - CPU#31 stuck for 22s! [migration/31:292]
[10605953.399966] BUG: soft lockup - CPU#31 stuck for 22s! [migration/31:292]

Event Timeline

Mentioned in SAL [2016-06-06T05:34:36Z] <robh> db2034 locked up via serial console. details on T137084, rebooting since its unresponsive to ssh or serial.

I've rebooted the host in an attempt to return it back online. This should be flagged into notes for the host history (we don't really have a good way to do that now.)

For now I'm setting it to high priority and assigned to @jcrespo for his review.

mysql isn't online, but im not sure if its as simple as just manually starting it, or if it has to be manually checked/synced. Since db2034 crashed and wasn't cleanly shut down, I don't want to assume I should just restart the db/mysql service.

jcrespo triaged this task as High priority.Jun 6 2016, 6:26 AM
jcrespo moved this task from Triage to In progress on the DBA board.

It seems there was a RAID controller failure:

A controller failure event occurred prior to this power-up

We had similar issues on T130702. We may need a general upgrade of all machines with similar models.

Restricted Application added a subscriber: Southparkfan. · View Herald Transcript

This host being down was creating log noise due to health checks (no users affected):

https://logstash.wikimedia.org/#dashboard/temp/AVUkao15_LTxu7wl9U3S